2011-08-29-CompCert-gets-a-safe-interpreter-mode.html

---
layout: post
author: Pascal Cuoq
date: 2011-08-29 14:35 +0200
categories: CompCert csmith value
format: xhtml
title: "CompCert gets a safe interpreter mode"
summary: 
---
{% raw %}
<h2>Safe C interpreters galore</h2> 
<p>The last release of <a href="http://compcert.inria.fr/">CompCert</a> includes a safe C interpreter based on the compiler's reference semantics. Like <a href="http://code.google.com/p/c-semantics/">KCC</a> and Frama-C's value analysis  it detects a large class of undefined behaviors and some unspecified ones. Like them  in order to remain useful  it needs to make some difficult choices regarding what it accepts and what it rejects. C programs are seldom written entirely in the "strictly conforming" subset of the language  even when their authors advertise them as "portable". I once found on the Internet a "portable memcpy()" that started by decrementing one of its pointer arguments (a definite undefined behavior if the pointer points to the beginning of a block). This is one case that CompCert's interpreter and Frama-C's value analysis both accept. To warn  they both wait until you either dereference or compare the invalid pointer. I can't speak for Xavier  but the reason why the value analysis waits is that too many programs do this  and too many programmers do not even accept that it is wrong when you point it out to them (to be clear  I am not talking about the industrial users interested in Frama-C for verification of critical C programs here; that would be cause for worry. I am talking about the random C programmer you may find anywhere else). It would be very easy to modify the value analysis to warn as soon as the invalid pointer is computed if someone desired strict enforcement of the standard.</p> 
<p>While Xavier was implementing this C interpreter  I was coincidentally working on an interpreter mode for the value analysis. This mode simply deactivates a number of features that are not useful for the step-by-step interpretation of deterministic C programs: recording memory states and detecting infinite loops. The option does not try to change the representation of memory states  that are still sets of possible valuations for all the variables in memory — in interpreter mode  the states just happen all to be singletons  and singletons that the interpreter represents slightly inefficiently. In fact  after it was done  I realized that this option would double up as a useful tool for some of the <a href="/index.php?post/2011/03/03/cosine-for-real">brute-force numerical precision analyses that are possible with the value analysis</a>  so that right now  I am struggling to find a name for the option. Since a benchmark was coming up  I also de-activated all consistency checks inside the value analysis. Lastly  I linked Frama-C with a miracle library that I need to tell you more about in another blog post.</p> 
<h2>Benchmark</h2> 
<p>And there it was: the benchmark. This is an informal little thing  strictly for fun. There is no point in doing more serious measurements yet since Xavier may still improve the representation of memory states in CompCert's interpreter. I used the three files below:</p> 
<ul> 
<li>SHA1 compression of a couple of small buffers  with a final check of the result: <a href="/assets/img/blog/imported-posts/interp_benchmark/sha1.c">sha1.c</a></li> 
<li>a program generated by the <a href="http://embed.cs.utah.edu/csmith/">Csmith</a> random C program generator. The program is in the intersection of what both interpreters support: <a href="/assets/img/blog/imported-posts/interp_benchmark/csmith.c">csmith.c</a></li> 
<li>the program <code>main(){ signed char c; for(c=1; c; c++); return c;}</code></li> 
</ul> 
<p>For reference  the commands used are (the Frama-C commands will not work with Carbon; you'll need to remove the unknown options; option <code>-obviously-terminates</code> can be approximated in Carbon with <code>-slevel 999999999 -no-results</code>):</p> 
<pre>$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates \ 
   -cpp-command "gcc -C -E -Dprintf=Frama_C_show_each" sha1.c 
... 
[value] Called Frama_C_show_each({{ &amp;"Test `%s': %s" }} {{ &amp;test_input_1 }}  
                                 {{ &amp;"passed" }}) 
... 
[value] Called Frama_C_show_each({{ &amp;"Test `%s': %s" }} {{ &amp;test_input_2 }}  
                                 {{ &amp;"passed" }}) 
... 
$ time ccomp -interp sha1.c  
... 
Test `abc': passed 
Test `abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq': passed 
... 
</pre> 
<pre>$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates -big-ints-hex 0 \ 
   -cpp-command "gcc -C -E -Dprintf=Frama_C_show_each -Iruntime" csmith.c 
... 
[value] Called Frama_C_show_each({{ &amp;"checksum = %X" }} {0x3F65930A}) 
... 
$ time ccomp -interp -Iruntime csmith.c 
checksum = 3F65930A 
</pre> 
<pre>$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates c.c 
$ time ccomp -interp c.c 
</pre> 
<h2>Results</h2> 
<p>The times shown are the medians of three runs.</p> 
<p>For the small program with the <code>signed char</code>  <code>ccomp -interp</code> takes 0.016s and the value analysis 0.096s. I am tempted to attribute the value analysis' bad score to startup time: Frama-C is probably heavier than a compiler by now  and it loads all its available plug-ins at start-up. I could have improved this by disabling plug-ins that are not necessary for interpreting C programs this way.</p> 
<p>For the SHA1 mini-benchmark  the value analysis takes 0.388s and <code>ccomp -interp</code> takes 0.296s.</p> 
<p>For the Csmith random program:</p> 
<p>Value analysis: 0.848s</p> 
<p>CompCert interpreter: 0.616s</p> 
<h2>Conclusion</h2> 
<p>I think this is not bad at all for either of the two interpreters. I have examples where the value analysis does better than CompCert  but it wouldn't be fair to include them at this point  because they poke at a known weakness in CompCert's interpreter. On the other hand  I do not think that the value analysis will ever do much better than the results above. If I stumble on improvements that look like they'll improve both the analyzer and the interpreter in the value analysis  I'll implement these immediately  but I do not think I should compromise the analyzer's performance to improve the interpreter. It is slow enough as it is. Just ask Boris.</p>
 <h2>Safe C interpreters galore</h2> 
<p>The last release of <a href="http://compcert.inria.fr/">CompCert</a> includes a safe C interpreter based on the compiler's reference semantics. Like <a href="http://code.google.com/p/c-semantics/">KCC</a> and Frama-C's value analysis  it detects a large class of undefined behaviors and some unspecified ones. Like them  in order to remain useful  it needs to make some difficult choices regarding what it accepts and what it rejects. C programs are seldom written entirely in the "strictly conforming" subset of the language  even when their authors advertise them as "portable". I once found on the Internet a "portable memcpy()" that started by decrementing one of its pointer arguments (a definite undefined behavior if the pointer points to the beginning of a block). This is one case that CompCert's interpreter and Frama-C's value analysis both accept. To warn  they both wait until you either dereference or compare the invalid pointer. I can't speak for Xavier  but the reason why the value analysis waits is that too many programs do this  and too many programmers do not even accept that it is wrong when you point it out to them (to be clear  I am not talking about the industrial users interested in Frama-C for verification of critical C programs here; that would be cause for worry. I am talking about the random C programmer you may find anywhere else). It would be very easy to modify the value analysis to warn as soon as the invalid pointer is computed if someone desired strict enforcement of the standard.</p> 
<p>While Xavier was implementing this C interpreter  I was coincidentally working on an interpreter mode for the value analysis. This mode simply deactivates a number of features that are not useful for the step-by-step interpretation of deterministic C programs: recording memory states and detecting infinite loops. The option does not try to change the representation of memory states  that are still sets of possible valuations for all the variables in memory — in interpreter mode  the states just happen all to be singletons  and singletons that the interpreter represents slightly inefficiently. In fact  after it was done  I realized that this option would double up as a useful tool for some of the <a href="/index.php?post/2011/03/03/cosine-for-real">brute-force numerical precision analyses that are possible with the value analysis</a>  so that right now  I am struggling to find a name for the option. Since a benchmark was coming up  I also de-activated all consistency checks inside the value analysis. Lastly  I linked Frama-C with a miracle library that I need to tell you more about in another blog post.</p> 
<h2>Benchmark</h2> 
<p>And there it was: the benchmark. This is an informal little thing  strictly for fun. There is no point in doing more serious measurements yet since Xavier may still improve the representation of memory states in CompCert's interpreter. I used the three files below:</p> 
<ul> 
<li>SHA1 compression of a couple of small buffers  with a final check of the result: <a href="/assets/img/blog/imported-posts/interp_benchmark/sha1.c">sha1.c</a></li> 
<li>a program generated by the <a href="http://embed.cs.utah.edu/csmith/">Csmith</a> random C program generator. The program is in the intersection of what both interpreters support: <a href="/assets/img/blog/imported-posts/interp_benchmark/csmith.c">csmith.c</a></li> 
<li>the program <code>main(){ signed char c; for(c=1; c; c++); return c;}</code></li> 
</ul> 
<p>For reference  the commands used are (the Frama-C commands will not work with Carbon; you'll need to remove the unknown options; option <code>-obviously-terminates</code> can be approximated in Carbon with <code>-slevel 999999999 -no-results</code>):</p> 
<pre>$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates \ 
   -cpp-command "gcc -C -E -Dprintf=Frama_C_show_each" sha1.c 
...
[value] Called Frama_C_show_each({{ &amp;"Test `%s': %s" }} {{ &amp;test_input_1 }}  
                                 {{ &amp;"passed" }}) 
... 
[value] Called Frama_C_show_each({{ &amp;"Test `%s': %s" }} {{ &amp;test_input_2 }}  
                                 {{ &amp;"passed" }}) 
... 
$ time ccomp -interp sha1.c  
... 
Test `abc': passed 
Test `abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq': passed 
... 
</pre> 
<pre>$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates -big-ints-hex 0 \ 
   -cpp-command "gcc -C -E -Dprintf=Frama_C_show_each -Iruntime" csmith.c 
... 
[value] Called Frama_C_show_each({{ &amp;"checksum = %X" }} {0x3F65930A}) 
... 
$ time ccomp -interp -Iruntime csmith.c 
checksum = 3F65930A 
</pre> 
<pre>$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates c.c 
$ time ccomp -interp c.c 
</pre> 
<h2>Results</h2> 
<p>The times shown are the medians of three runs.</p> 
<p>For the small program with the <code>signed char</code>  <code>ccomp -interp</code> takes 0.016s and the value analysis 0.096s. I am tempted to attribute the value analysis' bad score to startup time: Frama-C is probably heavier than a compiler by now  and it loads all its available plug-ins at start-up. I could have improved this by disabling plug-ins that are not necessary for interpreting C programs this way.</p> 
<p>For the SHA1 mini-benchmark  the value analysis takes 0.388s and <code>ccomp -interp</code> takes 0.296s.</p> 
<p>For the Csmith random program:</p> 
<p>Value analysis: 0.848s</p> 
<p>CompCert interpreter: 0.616s</p> 
<h2>Conclusion</h2> 
<p>I think this is not bad at all for either of the two interpreters. I have examples where the value analysis does better than CompCert  but it wouldn't be fair to include them at this point  because they poke at a known weakness in CompCert's interpreter. On the other hand  I do not think that the value analysis will ever do much better than the results above. If I stumble on improvements that look like they'll improve both the analyzer and the interpreter in the value analysis  I'll implement these immediately  but I do not think I should compromise the analyzer's performance to improve the interpreter. It is slow enough as it is. Just ask Boris.</p>
{% endraw %}