Commit 18e3988d authored by Augustin Lemesle's avatar Augustin Lemesle
Browse files

Merge branch 'master' of git.frama-c.com:pub/pub.frama-c.com

parents 2c2142da 36d8ca45
<header class="siteHeader" id="site_header">
<div id="header_iv_point" class="inviewTop"></div>
<span class="brandLogo"><a href="/index.html" rel="home" title=
"Frama-C"><img src="/assets/img/framac.gif" alt=""><span>Frama-C</span></a></span><a role="button" id=
"menu_toggle" class="menuToggle"><span class="open" onclick="openMenu()"><i></i><i></i><i></i></span><span class="close" onclick="closeMenu()"><i></i><i></i></span></a>
"Frama-C"><img src="/assets/img/framac.gif" alt=""><span>Frama-C</span></a></span>
<input class="burger-check" id="burger-check" type="checkbox"><label for="burger-check" class="burger"></label>
<nav id="menu" role="navigation">
<div class="menu-primary-meny-container">
<ul id="menu-primary-meny" class="menu">
<div>
<ul class="menu">
{% for item in site.data.nav %}
{% if item.id == include.header %}
<li class="menu-item menu-item-type-post_type menu-item-object-page current-menu-item page_item current_page_item">
<li class="menu-item current-menu-item">
{% else %}
<li class="menu-item menu-item-type-post_type menu-item-object-page">
<li class="menu-item">
{% endif %}
<a href="{{ item.link }}">{{ item.name }}</a>
</li>
{% endfor %}
{% if include.header == "download" %}
<li class="menu-item current-menu-item download">
{% else %}
<li class="menu-item download">
{% endif %}
<a href="/html/get-frama-c.html">Download</a>
</li>
</ul>
</div>
<a role="button" href="/html/get-frama-c.html" id="header_download_link" class="btnDownload"><span><i class=
"icon icon-curly-left"></i><i class="icon icon-download-arrow"></i><i class="icon icon-curly-right"></i></span></a>
</nav>
</header>
<script>
function openMenu(){
var x = document.getElementById("site_header");
x.classList.add("menuOpen");
}
function closeMenu(){
var x = document.getElementById("site_header");
x.classList.remove("menuOpen");
}
</script>
\ No newline at end of file
......@@ -37,34 +37,5 @@ typedef unsigned int size_t;
void* memcpy(void* region1 const void* region2 size_t n);
void* memset (void* dest int val size_t len);
</code></pre>
</blockquote> <p><a href="http://frama-c.com/download/value-analysis-Boron-20100401.pdf" hreflang="en">Boron's value analysis manual</a> has the beginning of a captivating tutorial and I figure that the suspense must be unbearable. I on the other hand already know how it ends. But I'm not telling yet. Neener-neener! Okay maybe later on this blog.</p>
<p>The new manual (with an expanded tutorial) will be released at the same time as Frama-C version Carbon which is not immediately for various technical reasons. The new version will have some amazing performance improvements but the Boron version is good enough for one of the steps that comes after the current tutorial ends:</p>
<blockquote><p><strong>Generalizing to arbitrary numbers of Update calls</strong></p>
<p>
As an exercise try to verify that there cannot be a run-time error
when hashing arbitrary contents by calling <code>Update(... 80)</code> an arbitrary number of times
between <code>Init</code> and <code>Final</code>. The general strategy
is to modify the C analysis context we have already written in a way
such that it is evident that it captures all such sequences of calls
and also in a way such that launched with adequate options
the value analysis does not emit any warning.</p>
<p>
The latter condition is harder than the former.
Observing results (with the GUI or observation functions
described in section 8.3) can help iterate towards
a solution. Be creative.</p>
</blockquote>
<p>Solution(s) in a few days.</p>
<p>Note: if you try the tutorial for the first time or on a new computer you may also appreciate this new paragraph in the documentation:</p>
<blockquote><p>Some compilation plaforms' headers define the names <code>memset</code>
and <code>memcpy</code> in a way that make it impossible to provide
your own implementation for these functions. If this happens for
yours you can try placing your own header in the analysis directory
under the name <code>string.h</code></p>
<pre><code>
typedef unsigned int size_t;
void* memcpy(void* region1 const void* region2 size_t n);
void* memset (void* dest int val size_t len);
</code></pre>
</blockquote>
{% endraw %}
......@@ -64,62 +64,5 @@ bool fill(value_type* a size_type n const value_type x);
return;
}
</pre>
<p>Hint: although strictly speaking this is not needed ghost code might be helpful.</p>
<h2>A simple example</h2>
<p>Broadly speaking, every location which is not mentioned in such a clause should have the same value when exiting the corresponding code as it when entering it. For instance, if we take the following declaration (extracted from <a href="http://first.fraunhofer.de/owx_download/acsl-by-example-5_1_0.pdf" hreflang="en">ACSL by Example</a>)</p>
<pre>/*@ assigns \
othing; */
bool equal(const value_type* a size_type n const value_type* b);
</pre>
<p>we claim that a call to <code>equal</code> won't modify the global state of the program. Note that a complete specification of the function would need some <code>requires</code> clause about the validity of pointers but we'll ignore that in this post to concentrate on <code>assigns</code>.</p>
<p>Now the implementation of equal would be the following:</p>
<pre>bool equal(const value_type* a size_type n const value_type* b)
{
/*@ loop assigns i; */
for (size_type i = 0; i &lt; n; i++)
if (a[i] != b[i])
return 0;
return 1;
}
</pre>
<p>This time we can not claim that the loop does not assign anything. More specifically it modifies the index <code>i</code>. Note that there is no contradiction with the fact that <code>equal</code> does not modify the global state of the program as <code>i</code> is only a local variable. One might argue that the scope of <code>i</code> being exactly the <code>for</code> loop it shouldn't be needed to have it mentioned in the <code>loop assigns</code>. However the ACSL manual explicitely says that in the case of a <code>for</code> loop loop annotations are evaluated after initialization takes place. In addition the other kind of loop annotations (<code>loop variant</code> and <code>loop invariant</code>) usually will refer to <code>i</code> and thus it wouldn't be coherent to have</p>
<pre>/*@ loop assigns \
othing;
loop variant n - i;
*/
for (size_type i = 0; i &lt; n; i++) ...
</pre>
<p>If nothing is modified by the loop it'd be difficult for the variant to strictly decrease at each step. In other words if you have a "normal" <code>for</code> loop (<em>i.e.</em> not something like <code>for(;;)</code>) the index must be part of the <code>loop assigns</code>.</p>
<h2>When the modified zone grows at each step</h2>
<p>Not all loops modify only their index. Writing <code>loop assigns</code> for loops that modify different locations at each step is a bit trickier than the <code>equal</code> example. If we now look at a function which fills a block of code we would have the following prototype:</p>
<pre>/*@ assigns a[0..n-1]; */
bool fill(value_type* a size_type n const value_type x);
</pre>
<p>meaning that we will modify the first <code>n</code> elements of the block pointed to by <code>a</code>.</p>
<p>When looking at the implementation a possible specification for the loop would be like this:</p>
<pre>void fill(value_type* a size_type n value_type x) {
/*@ loop assigns i a[0..(n-1)]; */
for (size_type i = 0; i &lt; n; i++)
a[i] = x;
return;
}
</pre>
<p>It is correct (at each loop steps the modified locations are included in the ones mentioned in the <code>loop assigns</code>) but this is an over-approximation. Let's see how we can refine it. The ACSL manual says that <code>loop assigns</code> are a special form of <em>inductive</em> invariants that is something which</p>
<ol>
<li>must be true when entering the loop</li>
<li>if supposed to be true at the beginning of a loop step still holds at the end of the same step.</li>
</ol>
<p>For a clause <code>loop assigns P;</code> this means that we have to prove that <code>\at(P end_loop_step)</code> contains both the sets of locations written by the current loop step and <code>\at(P begin_loop_step)</code> the set of locations written by the preceding steps. Coming back to our example at the end of the <code>i</code>-th step elements <code>0</code> to <code>i-1</code> have been modified (we have already incremented <code>i</code> at the point where we evaluate the locations) so that the correct and minimal clause is</p>
<pre>//@ loop assigns i a[0..(i-1)];
</pre>
<p>Note that for the code after the loop this does not change anything: when exiting the loop <code>i</code> is equal to <code>n</code> and thus we know that all elements up to <code>n-1</code> may have been modified. However when proving something (<em>e.g.</em> a <code>loop invariant</code>) inside the loop the latter <code>loop assigns</code> says that all elements between <code>i</code> and <code>n-1</code> are unchanged from their value at the very beginning of the loop while the former does provide any information on the values of these elements.</p>
<h2>A little exercise</h2>
<p>In order to keep up with the tradition of this blog here is a little exercise: what would be the most precise <code>loop assigns</code> of the following loop?</p>
<pre>void fill(char* a char x) {
for(;*a;a++)
*a = x;
return;
}
</pre>
<p>Hint: although strictly speaking this is not needed ghost code might be helpful.</p>
{% endraw %}
......@@ -138,136 +138,5 @@ lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);
<pre>lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);
</pre>
<p>Quiz 2: how should we get rid of the last alarm (that occurs in our file lib.c at line 10 function <code>memcpy</code> although it is caused by imprecision in the callers of that function)?</p>
<p>The files in their state at the end of this blog post are available <a href="/assets/img/blog/imported-posts/skein_verification.zip">here</a>. To rewind to earlier stages of the verification process simply erase <code>main.c</code> which is the only file we have modified so far.</p>
<p>In order to create a tradition of providing solutions to previous quizzes, this post is a partial answer to the question in <a href="/skein/value/2010/09/30/Progress-of-the-value-analysis-tutorial2" hreflang="en">this one about Frama-C's value analysis</a>.</p>
<p>To recap: at the end of the Boron tutorial we arrive at the <code>main</code> function below. This function is useful as an analysis context for the verification of hashing a 80-char message with Skein-256:</p>
<pre>#include "skein.h"
#define HASHLEN (32)
u08b_t msg[80];
void main(void)
{
int i;
u08b_t hash[HASHLEN];
for (i=0; i&lt;80; i++)
msg[i] = Frama_C_interval(0 255);
Skein_256_Ctxt_t skein_context;
Skein_256_Init(&amp;skein_context HASHLEN * 8);
Skein_256_Update(&amp;skein_context msg 80);
Skein_256_Final( &amp;skein_context hash);
}
</pre>
<p>The context analysis above allocates as a global array and initializes a message <code>msg</code> with arbitrary contents. It also allocates a structure <code>skein_context</code> as a local variable to be used by the library for ongoing computations.</p>
<p>Our goal is now to generalize the analysis to arbitrary numbers of <code>Skein_256_Update</code> calls. Let us replace the single <code>Skein_256_Update</code> call in the above context by the loop below. This loop uses <code>Frama_C_interval(0 1)</code> as its condition. The value analysis is guaranteed to take into account the possible results 0 and 1 for all these calls. This ensures that the results of the analysis apply to all the different execution paths we are now interested in although these paths differ in length.</p>
<pre> while (Frama_C_interval(0 1))
{
for (i=0; i&lt;80; i++)
msg[i] = Frama_C_interval(0 255);
Skein_256_Update(&amp;skein_context msg 80);
}
</pre>
<p>Using the same command-line as in the tutorial generates warnings about possible undefined behaviors:</p>
<p><code>frama-c -val -slevel 100 *.c .../share/builtin.c &gt; log</code></p>
<pre>grep ssert log
lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp);
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + n): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )1)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )2)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )3)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )4)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )5)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )6)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )7)): assert(TODO)
</pre>
<p>It is possible to increase the argument of option <code>-slevel</code> in order to unroll more iterations and postpone the moment in the analysis when precision is lost and alarms occur. However since the <code>while</code> loop is infinite there is no value for <code>-slevel</code> that lets this loop be analyzed with precision to its end.</p>
<p>In fact we do not want this <code>while</code> loop unrolled. We want the value analysis to do standard value analysis magic and extract properties that are true at all iterations of the loop without studying each iteration individually. In other words we want the value analysis not to unroll the while loop (but to continue unrolling the other finite loops so as to remain as precise as it was for the single <code>Skein_256_Update</code> call).</p>
<p>One way to obtain this result is to move the <code>for</code> loop outside function <code>main</code> so that the <code>while</code> loop is the only one that remains in <code>main</code> and to use the option <code>-slevel-function</code> introduced in Frama-C Boron. This option allows "semantic unrolling" to be tuned function by function. Here we can use it to tell the value analysis to unroll loops except inside <code>main</code>.</p>
<p>The complete analysis context becomes:</p>
<pre>#include "skein.h"
#define HASHLEN (32)
u08b_t msg[80];
void arbitrarize_msg(void)
{
for (int i=0; i&lt;80; i++) msg[i]=Frama_C_interval(0 255);
}
u08b_t hash[HASHLEN];
void main(void)
{
int i;
u08b_t hash[HASHLEN];
Skein_256_Ctxt_t skein_context;
Skein_256_Init(&amp;skein_context HASHLEN * 8);
while (Frama_C_interval(0 1))
{
arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
}
Skein_256_Final( &amp;skein_context hash);
}
</pre>
<p>The new command-line to use is:</p>
<p><code>frama-c -val -slevel 100 -slevel-function main:0 *.c &gt; log</code></p>
<p>The analysis takes a minute or so.</p>
<pre>grep ssert log
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + n): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )1)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )2)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )3)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )4)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )5)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )6)): assert(TODO)
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )7)): assert(TODO)
lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp);
lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);
</pre>
<p>The warnings from the previous attempt are still there and they have a surprise guest at lib.c line 18 (in function <code>memset</code>). But since the only thing we have changed since the last attempt is to prevent a loop from being unrolled this was to be expected (you may have noticed that the analysis was a little bit faster also as a result of not unrolling the <code>while</code> loop).</p>
<p>Now let us try to get rid of these warnings for good. For this purpose we need information about what is causing them. Plunging in the code with <code>frama-c-gui</code> is a good way to obtain some of that but this approach lacks the ability to display values of variables for different calls to <code>Skein_256_Update</code> separately. Purely as an experiment in order to have a place to click to see the state at the first second ... fifth iteration let us change the <code>while</code> loop to:</p>
<pre> arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
while (Frama_C_interval(0 1))
{
arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
}
</pre>
<p>Using the command-line <code>frama-c-gui -val -slevel 100 -slevel-function main:0 *.c</code> the first surprise is that we have accidentally eliminated most of the alarms by hand-unrolling the <code>while</code> loop. Indeed only the following now remains:</p>
<pre>lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp);
</pre>
<p>We can stick to the initial plan and observe the states after each of the first five iterations to understand what happened. This is no time to celebrate victory yet. For starters the verification is not done until all alarms have been eliminated. Also our investigative instrumentation has eliminated some execution paths — the executions paths with less than five iterations — that would still need to be studied separately before the verification can be called finished.</p>
<p>Here are the contents of variable <code>skein_context</code> before the first second and third calls:</p>
<pre>skein_context.h.hashBitLen ∈ {256; }
.h{.bCnt; .T[0]; } ∈ {0; }
.h.T[1] ∈ {8070450532247928832; }
.X[0] ∈ {4072681676153946182; }
.X[1] ∈ {5436642852965252865; }
.X[2] ∈ {2889783295839482789; }
.X[3] ∈ {6109786322067550404; }
.b[0..31] ∈ UNINITIALIZED
</pre>
<pre>skein_context.h.hashBitLen ∈ {256; }
.h.bCnt ∈ {16; }
.h.T[0] ∈ {64; }
.h.T[1] ∈ {3458764513820540928; }
{.X[0..3]; .b[0..15]; } ∈ [--..--]
.b[16..31] ∈ UNINITIALIZED
</pre>
<pre>skein_context.h.hashBitLen ∈ {256; }
.h.bCnt ∈ {32; }
.h.T[0] ∈ {128; }
.h.T[1] ∈ {3458764513820540928; }
{.X[0..3]; .b[0..31]; } ∈ [--..--]
</pre>
<p>Based on this information it is now clear where the "accessing uninitialized left-value" warnings came from: there is inside the structure <code>skein_context</code> a buffer that only becomes initialized after the first two calls to <code>Skein_256_Update</code>. And these alarms are now gone because these iterations when the buffer is not yet completely initialized are handled separately from the latter iterations when the buffer is initialized and presumably read from at some point.</p>
<p>Quiz 1: continuing the study of the first five calls to <code>Skein_256_Update</code> what used to cause the warning below and what made it disappear?</p>
<pre>lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);
</pre>
<p>Quiz 2: how should we get rid of the last alarm (that occurs in our file lib.c at line 10 function <code>memcpy</code> although it is caused by imprecision in the callers of that function)?</p>
<p>The files in their state at the end of this blog post are available <a href="/assets/img/blog/imported-posts/skein_verification.zip">here</a>. To rewind to earlier stages of the verification process simply erase <code>main.c</code> which is the only file we have modified so far.</p>
{% endraw %}
......@@ -38,36 +38,5 @@ lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);
<p>If you still have the <code>frama-c-gui</code> with the results for the main with unrolling lying around you can check that in that case the values for&nbsp;<code>ctx-&gt;h.bCnt</code>&nbsp;are {16; 32; } at the <code>if</code> and { 16; } at the call.</p>
<p>This leads us to the following conclusion: the imprecision that leads to the false alarm at lib.c:18 is caused by analyzing the call to <code>memset</code> with imprecise arguments in function <code>Sein_256_Final</code>. By unrolling the first few calls to <code>Skein_256_Update</code> we also forced these calls to happen systematically and that made the analysis conditions for <code>Skein_256_Final</code> simpler. When completing the verification with separate analyses of the omitted cases the arguments to <code>memset</code> are precise again (and complementary) so the false alarm is omitted again.</p>
<p>In fact if we kept investigating in the same fashion we would find that the case&nbsp;<code>ctx-&gt;h.bCnt==0</code> corresponds to the absence of any call to <code>Skein_256_Update</code>. This case is indeed covered by the <code>while</code> loop in the <code>main</code> analysis context without unrolling. The alarm we obtained when analyzing that loop does not mean there is a problem with the sequence <code>Skein_256_Init(...); Skein_256_Final(...);</code> only that some peculiar values happen in this case and that analyzing this case together with the others leads to false alarms. When verifying software that has already been tested and that its authors take pride on it makes sense to heuristically assume that alarms are false alarms when investigating them. One should however remain honest and continue to investigate until the alarm is understood—ideally until the value analysis can confirm that there was no alarm after all.</p>
<p>Next time the answer to the other quiz.</p>
<p>This is another episode in the advanced value analysis tutorial. The first episode was <a href="/skein/value/2010/09/30/Progress-of-the-value-analysis-tutorial2" hreflang="en">here</a> and the second one <a href="/index.php?post/2010/10/07/Value-analysis-tutorial%2C-part-2" hreflang="en">here</a>.</p>
<p>There were a couple of questions left unanswered at the break. This post answers the first one.</p>
<blockquote><p>Quiz 1: continuing the study of the first five calls to <code>Skein_256_Update</code> what used to cause the warning below and what made it disappear?</p>
<p><code>lib.c:18:<a href="kernel" title="kernel">kernel</a> warning: out of bounds write. assert \valid(tmp);</code></p>
</blockquote>
<p>Value analysis messages in the log are output in order. To understand what the analysis was doing when an alarm was emitted it is recommended to look at the surrounding entries in the log. Using the command-line <code>frama-c -val -slevel 100 -slevel-function main:0 *.c &gt; log</code>&nbsp;on the version of <code>main</code> without unrolling we get the following information about the circumstances that cause the value analysis to worry about line lib.c:18 in function&nbsp;<code>memset</code>:</p>
<pre>...
[value] Done for function Skein_256_Update
[value] computing for function Skein_256_Final &lt;-main.
Called from main.c:44.
[value] computing for function memset &lt;-Skein_256_Final &lt;-main.
Called from skein.c:212.
lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);
[value] Recording results for memset
[value] Done for function memset
...</pre>
<p>Using <code>frama-c-gui</code> to observe the value of the troublesome pointer inside <code>memset</code> we see&nbsp;for the&nbsp;<code>main</code>&nbsp;with unrolling:</p>
<p><code>tmp ∈ {{ &amp;skein_context + [24..87] ; &amp;cfg + [0..31] ;}}</code></p>
<p>Repeating the procedure for the&nbsp;<code>main</code>&nbsp;without unrolling we see:</p>
<p><code>tmp ∈ {{ &amp;skein_context + [24..88] ; &amp;cfg + [0..31] ;}}</code></p>
<p>It seems that the pointer causing the problem is at an offset of 88 bytes inside <code>skein_context</code>. To confirm this still while in <code>frama-c-gui</code> we can select any statement of the function <code>main</code> and use "Evaluate expression" in the contextual menu to evaluate the expression <code>sizeof(skein_context)</code>. The result is as follows:</p>
<pre>[...] all the values taken by the expression sizeof(skein_context) are contained in {88; }</pre>
<p>Note: not all C expressions can be evaluated this way but expressions of the form&nbsp;<code>sizeof(v)</code>&nbsp;can at a statement where the variable <code>v</code> is in scope. This is often useful to make sense of the offsets in bytes displayed by the value analysis.</p>
<p>So the problem is indeed that the value analysis thinks that the pointer may refer to offset 88 of <code>skein_context</code> when dereferenced inside <code>memset</code>. This address is out of range. According to the log snippet above this happens when <code>memset</code> is called from <code>Skein_256_Final</code>. The log points us to line 212 in file skein.c that is the first call in that function. This call is guarded by the following conditional and comment:</p>
<code><p>if (ctx-&gt;h.bCnt &lt; SKEIN_256_BLOCK_BYTES) &nbsp; /* zero pad b[] if necessary */</p>
<p>&nbsp;&nbsp; &nbsp; &nbsp;memset(&amp;ctx-&gt;b[ctx-&gt;h.bCnt] 0 SKEIN_256_BLOCK_BYTES - ctx-&gt;h.bCnt);</p>
</code><p>In that call the pointer argument is <code>&amp; ctx-&gt;b[ctx-&gt;h.bCnt]</code> and the values recorded at the <code>if</code> for <code>ctx-&gt;h.bCnt</code> are {0; 16; 32; } whereas the values at the call&nbsp;are {0; 16; } (the analysis knows that in the case of 32 the call is not executed because the condition <code>ctx-&gt;h.bCnt &lt; SKEIN_256_BLOCK_BYTES</code> is false).</p>
<p>If you still have the <code>frama-c-gui</code> with the results for the main with unrolling lying around you can check that in that case the values for&nbsp;<code>ctx-&gt;h.bCnt</code>&nbsp;are {16; 32; } at the <code>if</code> and { 16; } at the call.</p>
<p>This leads us to the following conclusion: the imprecision that leads to the false alarm at lib.c:18 is caused by analyzing the call to <code>memset</code> with imprecise arguments in function <code>Sein_256_Final</code>. By unrolling the first few calls to <code>Skein_256_Update</code> we also forced these calls to happen systematically and that made the analysis conditions for <code>Skein_256_Final</code> simpler. When completing the verification with separate analyses of the omitted cases the arguments to <code>memset</code> are precise again (and complementary) so the false alarm is omitted again.</p>
<p>In fact if we kept investigating in the same fashion we would find that the case&nbsp;<code>ctx-&gt;h.bCnt==0</code> corresponds to the absence of any call to <code>Skein_256_Update</code>. This case is indeed covered by the <code>while</code> loop in the <code>main</code> analysis context without unrolling. The alarm we obtained when analyzing that loop does not mean there is a problem with the sequence <code>Skein_256_Init(...); Skein_256_Final(...);</code> only that some peculiar values happen in this case and that analyzing this case together with the others leads to false alarms. When verifying software that has already been tested and that its authors take pride on it makes sense to heuristically assume that alarms are false alarms when investigating them. One should however remain honest and continue to investigate until the alarm is understood—ideally until the value analysis can confirm that there was no alarm after all.</p>
<p>Next time the answer to the other quiz.</p>
{% endraw %}
......@@ -38,35 +38,5 @@ summary: <p>This post shows various ways to specify the <code>loop assigns</code
<p>This time Jessie accepts to generate the proof obligation<sup>[<a href="#pnote-8-1" id="rev-pnote-8-1">1</a>]</sup> but the automated theorem provers seem unable to handle it. In fact This annotation alone is not provable even if we assume that all pointer accesses are valid and no arithmetic overflow occurs. Why is that? Well I guess this will be the subject of the quizz for next post.</p>
<div class="footnotes"><h4>Notes</h4>
<p>[<a href="#rev-pnote-8-1" id="pnote-8-1">1</a>] The analyses have been made with Frama-C Boron and Why 2.27</p>
</div> <h2>Context</h2>
<p><a href="/acsl/2010/10/06/Specification-of-loop-assigns">Last week's post</a> asked how to express a loop assigns for the following code:</p>
<pre>void fill(char* a char x) {
for(;*a;a++)
*a = x;
return;
}
</pre>
<p>The difficulty here is that the pointer <code>a</code> is modified at each step of the loop and that we must take this fact into account when writing the <code>loop assigns</code></p>
<h2>Very broad (but correct) annotations</h2>
<p>Remember that (<code>loop</code>) <code>assigns</code> clauses are an overapproximation of the locations that are actually written. Thus a first possibility is to say that each step can potentially modify any byte from the block pointed to by <code>a</code> (and a itself of course).</p>
<pre>loop assigns a a[..];
</pre>
<p>Since we do not specify any of the bounds in the range we really mean that each step is allowed to modify any location reachable from (the current value of) a in any direction. In fact it is not too difficult to be slightly more precise: <code>a</code> is the upper bound of the locations that have been written to so far so that the following clause is also correct:</p>
<pre>loop assigns a a[..(-1)];
</pre>
<p>However this assignment is too broad: if we call <code>fill</code> with a parameter that points to the middle of a block the <code>loop assigns</code> tells that the beginning of the block can be modified too. Let's see if we can do better.</p>
<h2>Precise annotations</h2>
<p>In fact we want to say that before entering the current loop step we have modified the set of locations that are between the initial (which is denoted as <code>\at(a Pre)</code> in ACSL's terms) and the current value of <code>a</code> the latter exlcuded. This can be done in ACSL by defining the set in comprehension: the ACSL expression below is literally the translation of the previous sentence.</p>
<pre>loop assigns a *{ p | char* p; \at(a Pre) &lt;= p &lt; a };
</pre>
<p>Unfortunately such definition is not handled by the Jessie plugin yet and we can not use Frama-C to verify that it is indeed correct (at least it is accepted by the parser but that only means that this is a well-typed expression). In fact Jessie is more comfortable with ranges of location and we can try to use that starting either from <code>a</code> and going backward or from <code>\at(a Pre)</code> and going forward:</p>
<pre>loop assigns a a[(\at(a Pre)-a)..(-1)];
</pre>
<p>Alternative annotation starting from the initial value of <code>a</code>:</p>
<pre>loop assigns a \at(a Pre)[0..(a-\at(a Pre)-1)];
</pre>
<p>This time Jessie accepts to generate the proof obligation<sup>[<a href="#pnote-8-1" id="rev-pnote-8-1">1</a>]</sup> but the automated theorem provers seem unable to handle it. In fact This annotation alone is not provable even if we assume that all pointer accesses are valid and no arithmetic overflow occurs. Why is that? Well I guess this will be the subject of the quizz for next post.</p>
<div class="footnotes"><h4>Notes</h4>
<p>[<a href="#rev-pnote-8-1" id="pnote-8-1">1</a>] The analyses have been made with Frama-C Boron and Why 2.27</p>
</div>
</div>
{% endraw %}
......@@ -24,22 +24,5 @@ undefined behaviors. Fast alias analyses intended to scale to millions of lines
<h2>What is a well-defined execution?</h2>
<p>The value analysis is more precise and more aggressive than the typical large-scale alias analysis in its handling of what look like possible quirks in the analyzed program. The reason the <code>-val-signed-overflow-alarms</code> option mentioned above is not enabled by default is that several test programs became meaningless. The analysis was in effect saying "look this program is sure to produce an overflow here at the third line there is no point in going further until this has been resolved" and refused to analyze the rest of the program. The typical large-scale context-insensitive path-insensitive alias analysis does not have this problem but that's only because it is too imprecise to have it. It is still making assumptions such that you do not access <code>p</code> when you read or write from <code>*(&amp;q+1)</code>.</p>
<p>So does this mean that all high-level analyses of C programs are meaningless unless the absence of undefined behaviors in target program is also verified(1)? In the style of Dan Simmons' Hyperion cycle we are going to leave this crucial question for another volume that is a future blog post.</p>
<p>(1) which it rarely is in practice. As pointed out earlier most of them do not even tell you whether you should worry and what to worry about.</p>
<h2>Prologue</h2>
<p><strong>The C standard(s) specifies a minimum of things</strong> that all C compilers must agree on. For the sake of efficiency, many syntactically correct constructs are left without an unambiguous meaning.</p>
<p>The worst way for a construct not to have an unambiguous meaning is to be <em>undefined behavior</em>. An example of undefined behavior is invalid pointer access. Not all platforms have the required MMU, and even when they do, the MMU does not catch all invalid memory uses. Accessing an invalid pointer is not guaranteed to crash the program cleanly. A program that does so may \crash cleanly" or continue executing in an inconsistent state. The program may crash a little bit later because of the inconsistency or it may appear to work fine and return a wrong result or it may reformat your harddrive. All these behaviors and others are encompassed by "undefined behavior".</p>
<h2>Compilers and compilation-platform-specific static analysis</h2>
<p>Recently C compilers have started to take advantage of undefinedness in the C99 standard for the sake of aggressive optimization. They apply program transformations that are <strong>only valid in the case of defined behaviors</strong> and leave the program doing strange unpredictable things in other cases. John Regehr has a very good explanation of C's undefined behaviors that spans <a href="http://blog.regehr.org/archives/213" hreflang="en">three blog posts</a>.</p>
<p>There are other more benign ways for a construct not to have an unambiguous meaning though. <em>Implementation-defined behaviors</em> must be documented and consistent for a given compiler. In the case of <em>unspecified behaviors</em> one of several plausible alternatives must happen. In Frama-C we have reclassified some unspecified and even some undefined behaviors as implementation-defined. For instance the value analysis assumes by default that signed arithmetic overflows have been put there on purpose by the programmer and that he intended 2's complement modulo results. In this default configuration the value analysis does not warn about these overflows and propagate 2's complement modulo results. John Regehr convinced us however that this was not satisfactory so there is now an option <code>-val-signed-overflow-alarms</code> for when the programmer is not supposed to rely on this behavior (signed arithmetic overflows are technically undefined behavior). The Jessie plug-in allows roughly the same distinction with the defaults reversed (by default you get alarms for integer overflows).
In short a Frama-C analysis session is parameterized by a <strong>description of the compilation platform</strong> that includes genuine or reclassified implementation-defined choices.</p>
<p>That still leaves of course some unspecified and undefined behaviors that the value analysis or Jessie plug-ins do indeed warn about and prove the absence of when they do not warn about them. Each construct that may cause some kind of unspecifiedness or undefinedness is the object of a warning containing the word <code>assert</code> in the value analysis log. Terms and conditions apply.</p>
<p>This post is about the one use case where the user does not have to worry about the alarms So if you were tired of the ongoing thread about the verification of Skein-256 good news! This is something else.</p>
<h2>Derived analyses</h2>
<p>Well I say "one" use case but it's really a family of use cases. I am talking about all the <strong>analyses that are helping the user to comprehend the code</strong> and that can therefore not mandate that all the alarms have been eliminated through the kind of laborious inspection and refinement illustrated in the Skein verification thread. Instead the values and therefore the results of the analysis that relies on them are guaranteed to apply to well-defined executions only (executions that do not falsify any of the assertions that have been emitted). As long as this is kept in mind it may even not be necessary to look at these alarms. In fact it is standard for high-level analyzers of C programs to <strong>assume the absence of undefined behaviors</strong> or apply only to executions without
undefined behaviors. Fast alias analyses intended to scale to millions of lines of C for instance assume that you do not transmit an address to pointer <code>p</code> by writing it in <code>*(&amp;q+1)</code> where <code>q</code> is the address below <code>p</code> in memory. These analyses and any other analyses that rely on the first ones do not tell you whether you should have reasons to worry about this happening. If you should worry they do not give you a list of dangerous constructs in the target program. The value analysis provides such a list of alarms but you can ignore it if you wish and then you get results with similar caveats.</p>
<p>Well almost. We are finally arriving to the core of the problem.</p>
<h2>What is a well-defined execution?</h2>
<p>The value analysis is more precise and more aggressive than the typical large-scale alias analysis in its handling of what look like possible quirks in the analyzed program. The reason the <code>-val-signed-overflow-alarms</code> option mentioned above is not enabled by default is that several test programs became meaningless. The analysis was in effect saying "look this program is sure to produce an overflow here at the third line there is no point in going further until this has been resolved" and refused to analyze the rest of the program. The typical large-scale context-insensitive path-insensitive alias analysis does not have this problem but that's only because it is too imprecise to have it. It is still making assumptions such that you do not access <code>p</code> when you read or write from <code>*(&amp;q+1)</code>.</p>
<p>So does this mean that all high-level analyses of C programs are meaningless unless the absence of undefined behaviors in target program is also verified(1)? In the style of Dan Simmons' Hyperion cycle we are going to leave this crucial question for another volume that is a future blog post.</p>
<p>(1) which it rarely is in practice. As pointed out earlier most of them do not even tell you whether you should worry and what to worry about.</p>
{% endraw %}
......@@ -8,45 +8,7 @@ title: "Loop assigns, part 3: On the importance of loop invariants"
summary: This post is the third of a serie dealing with loop assigns. The first can be found\r\n[here|/acsl/2010/10/06/Specification-of-loop-assigns] and the second [here|/index.php?post/2010/10/15/loop-assigns%2C-part-2].
---
{% raw %}
!!!! Context
Last week's post mentioned that it is not possible to prove the following @@loop assigns@@ with Jessie:
///
void fill(char* a, char x) {
//@ loop assigns a, \at(a,Pre)[0..(a-\at(a,Pre)-1)];
for(;*a;a++)
*a = x;
return;
}
///
and in fact this annotation is not provable. An hint to where the issue lies is given by the title of this post: we don't provide enough context on what the loop steps are doing.
!!!! Loop invariants
One (perhaps the most) important thing to remember when doing deductive verification is that one has to provide sufficiently strong @@loop invariant@@ for each loop of the program. Formally, @@loop invariant@@ is a property which must hold before entering the loop and is preserved by a loop step (''i.e.'' if it holds before a step, it must also hold after the step has been executed). Another way to look at this is to say that the invariant abstracts away what has been done by the first @@k@@ steps (for an arbitrary @@k@@). In this view, the loop invariant is __the only thing which is known about the state of the program__ after an arbitrary number of loop steps, and thus we must provide the right amount of information to be able to prove
# our property of interest (here @@loop assigns@@, which is a special case of invariant)
# the loop invariant itself (otherwise we just exchange an unprovable goal against another, which is not that useful regarding the correctness of the whole function).
In our case, the missing invariant is simply that @@a@@ is increasing: if we don't mention that, then for all what Jessie knows @@a-\at(a,Pre)-1@@ could well be negative (since @@a@@ itself has been modified, but we don't say anything about its new value), and thus our assigned interval be empty, whereas it should at least include @@a-1@@. Thus a sufficient loop invariant would be:
///
loop invariant \at(a,Pre) - a <= 0;
///
and this time, the @@loop assigns@@ can be discharged on the whole program:
///
void fill(char* a, char x) {
/*@ loop assigns a, a[(\at(a,Pre)-a)..(-1)];
loop invariant \at(a,Pre) - a <= 0;
*/
for(;*a;a++)
*a = x;
return;
}
///
As said in the previous, we are not interested for the moment in safety issues (\i.e." we assume that all pointer accesses are valid for now). We thus instruct Jessie not to generate the proof obligations corresponding to potential run-time errors with a particular @@-jessie-behavior@@ option:
///
frama-c -jessie -jessie-behavior default -jessie-atp alt-ergo
///
Unfortunately alt-ergo is still not able to discharge the proof and neither can Simplify nor z3. It seems like the proof obligation generated by jessie is too complicated for automated theorem provers. However the specification is correct as the attached coq file that discharge all proof obligations generated by
///
frama-c -jessie -jessie-behavior default -jessie-atp coq
///
shows." <h2>Context</h2>
<h2>Context</h2>
<p>Last week's post mentioned that it is not possible to prove the following <code>loop assigns</code> with Jessie:</p>
<pre>
void fill(char* a, char x) {
......
......@@ -19,17 +19,5 @@ summary:
<p>So far so good. If there is a <code>10.0</code> literal in the program it stands for the double 10.0. If there is a <code>0.1</code> in the program it stands for one of representable numbers no further than 1ulp from the real 0.1. This is slightly annoying because the representable number chosen by CIL may not be the same as the representable number chosen by your actual C compiler but in all likelihood both will round the number from the source code to the nearest double-precision floating-point number. Thus the AST analyzed in Frama-C accurately represents the compiled program and all is well.</p>
<p>The picture gets muddied a little if the target program contains a literal such as <code>3.14f</code>. This is supposed to represent a single-precision floating-point literal in C. I'm willing to bet without really looking that CIL represents that literal in the AST with the double-precision number nearest to the real number written in decimal in the target program. If you are a Frama-C plug-in author willing to do the right thing you may think of rounding this double-precision number to the nearest single-precision number yourself but that doesn't work: the double rounding can make your final result different from the compiler's which will round only once from real to single-precision. To be correct you have to go back to the string representation that has been saved from the source code.</p>
<p>In fact Frama-C's front-end should be doing this for all plug-ins of course. When the literal is single-precision it should represent it in the AST as an OCaml float containing the single-precision number meant by the programmer (all single-precision numbers can be represented as double-precision numbers). Probably this will be done soon now that the problem is identified. This was just an example of how far support for single-precision numbers in Frama-C is.</p>
<p>Full disclosure: I said I was betting without looking but I wouldn't take the bet if I were you. I do not just risk public humiliation like that without strong presumptions. First I was just working on something related and visible effects of this subtlety came up as one annoying unforeseen issue. Secondly there is no function in OCaml to parse a number as single-precision. There are some OCaml functions to parse double-precision numbers and this task is enough of a bother that these are defined by calling the C function <code>strtod()</code>. The way to fix the front-end is probably to replicate this OCaml-C interface for <code>strtof()</code>. So my bet is in fact that CIL's authors neither implemented a single-precision parsing function in OCaml nor interfaced it from C.</p>
<p>Every once in a while, someone asks about single-precision floating-point support in Frama-C. Until recently it was often in the context of the value analysis, but actually, thanks to a lot of interesting new results obtained in the context of <a href="http://hisseo.saclay.inria.fr/index.html" hreflang="en">this project</a> people working on deductive verification within Frama-C can now look forward to being harassed too.</p>
<p>We are not nearly home and dry yet though. Frama-C's front-end is <a href="http://www.cs.berkeley.edu/~necula/cil/" hreflang="en">CIL</a> (no relation to something from Microsoft with a similar acronym). CIL is written in <a href="http://caml.inria.fr/" hreflang="en">OCaml</a> (the rest of Frama-C is written in OCaml too but never mind that). OCaml has a single floating-point type which corresponds to <a href="http://en.wikipedia.org/wiki/IEEE_754-2008" hreflang="en">IEEE 754</a> double precision.</p>
<p>So guess how floating-point literal constants are represented in the abstract syntax tree built by CIL...</p>
<p>If you guessed "as an IEEE 754 double-precision number" you are partly right. Yay! But also as a string. CIL's authors did think about this especially since one of CIL's first use was for source-to-source transformation. Specifically you have all the information in this constructor defined in file frama-c/cil/src/cil_types.mli:</p>
<pre>| CReal of float * fkind * string option
(** Floating point constant. Give the fkind (see ISO 6.4.4.2) and also
* the textual representation if available. *)</pre>
<p>The above is OCaml code and the type <code>float</code> means double-precision floating-point number.</p>
<p>So far so good. If there is a <code>10.0</code> literal in the program it stands for the double 10.0. If there is a <code>0.1</code> in the program it stands for one of representable numbers no further than 1ulp from the real 0.1. This is slightly annoying because the representable number chosen by CIL may not be the same as the representable number chosen by your actual C compiler but in all likelihood both will round the number from the source code to the nearest double-precision floating-point number. Thus the AST analyzed in Frama-C accurately represents the compiled program and all is well.</p>
<p>The picture gets muddied a little if the target program contains a literal such as <code>3.14f</code>. This is supposed to represent a single-precision floating-point literal in C. I'm willing to bet without really looking that CIL represents that literal in the AST with the double-precision number nearest to the real number written in decimal in the target program. If you are a Frama-C plug-in author willing to do the right thing you may think of rounding this double-precision number to the nearest single-precision number yourself but that doesn't work: the double rounding can make your final result different from the compiler's which will round only once from real to single-precision. To be correct you have to go back to the string representation that has been saved from the source code.</p>
<p>In fact Frama-C's front-end should be doing this for all plug-ins of course. When the literal is single-precision it should represent it in the AST as an OCaml float containing the single-precision number meant by the programmer (all single-precision numbers can be represented as double-precision numbers). Probably this will be done soon now that the problem is identified. This was just an example of how far support for single-precision numbers in Frama-C is.</p>
<p>Full disclosure: I said I was betting without looking but I wouldn't take the bet if I were you. I do not just risk public humiliation like that without strong presumptions. First I was just working on something related and visible effects of this subtlety came up as one annoying unforeseen issue. Secondly there is no function in OCaml to parse a number as single-precision. There are some OCaml functions to parse double-precision numbers and this task is enough of a bother that these are defined by calling the C function <code>strtod()</code>. The way to fix the front-end is probably to replicate this OCaml-C interface for <code>strtof()</code>. So my bet is in fact that CIL's authors neither implemented a single-precision parsing function in OCaml nor interfaced it from C.</p>
{% endraw %}
......@@ -79,77 +79,5 @@ skein_context.h.bCnt ∈ {16; 32; }</pre>
}
</pre>
<p>With the command-line <code>frama-c -val -slevel 100 -slevel-function main:0 *.c</code> the value analysis does not emit any alarm for this <code>main()</code>. The case of one call was already handled in the tutorial chapter of the value analysis documentation. The separate cases of 0 calls and of an odd number of calls larger than three are left to the reader and this time there will not be any further refinement of this solution in a future blog post. Ask on the <a href="http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/frama-c-discuss" hreflang="en">mailing list</a> if you encounter a problem finishing the verification from here.</p>
<p>In the next episode an alternative and more generalizable solution to the same problem. Or maybe a meta-discussion about the verification methodology being sketched out in this thread.</p>
<p>This post offers one answer to the second quiz from <a href="/index.php?post/2010/10/07/Value-analysis-tutorial%2C-part-2" hreflang="en">part 2</a>. For context here are links to <a href="/skein/value/2010/09/30/Progress-of-the-value-analysis-tutorial2" hreflang="en">part 1</a> and <a href="/index.php?post/2010/10/15/Value-analysis-tutorial%2C-part-3" hreflang="en">part 3</a>.</p>
<p>The question was: how should we get rid of the last alarm below and conclude that Skein-256 is indeed safe from run-time errors when used in the conditions previously described?</p>
<pre>lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp);
</pre>
<p>The alarm appears when using the command-line:</p>
<pre>frama-c-gui -val -slevel 100 -slevel-function main:0 *.c</pre>
<p>Inspecting values inside function <code>memcpy</code> we can see that the problematic address is again at offset 88 inside structure <code>skein_context</code>. Following the same procedure as in <a href="/index.php?post/2010/10/15/Value-analysis-tutorial%2C-part-3" hreflang="en">part 3</a> we identify this context for the alarm inside the analysis log:</p>
<pre>...
[value] computing for function Skein_256_Update &lt;-main.
Called from main.c:40.
[value] computing for function memcpy &lt;-Skein_256_Update &lt;-main.
Called from skein.c:171.
lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp);
[value] Recording results for memcpy
[value] Done for function memcpy
...
</pre>
<p>Several remarks:</p>
<ol>
<li>the problematic <code>Skein_256_Update</code> call is the call inside the <code>while</code> loop (at line main.c:40). This is completely unsurprising.</li>
<li>Function <code>memcpy</code> is called several times from <code>Skein_256_Update</code> but the call that results in an alarm is the call at skein.c:171.</li>
</ol>
<p>Inspecting the incriminated call site in the GUI the address argument to <code>memcpy</code> is <code>&amp;ctx-&gt;b[ctx-&gt;h.bCnt]</code> and the values that the analysis thinks are taken by <code>ctx-&gt;h.bCnt</code> at this point are 16 and 32. The length argument <code>n</code> is always 16. Since the array <code>b</code> inside struct <code>skein_context</code> is only 32 bytes it is normal to get an alarm inside <code>memcpy</code> with such arguments.</p>
<p>Now the reason we originally unrolled a few calls to <code>Skein_256_Update</code> was in order to reverse-engineer what the function was doing much as if we had inserted calls to <code>printf</code> inside the source code except without the need to choose in advance the values to inspect. So let us use the GUI to check the values of <code>skein_context.h.bCnt</code> for the first few calls.</p>
<p>Using "Evaluate expression" on the first call to <code>Skein_256_Update</code>:</p>
<pre>Before statement:
skein_context.h.bCnt ∈ {0; }
At next statement:
skein_context.h.bCnt ∈ {16; }</pre>
<p>On the second call:</p>
<pre>Before statement:
skein_context.h.bCnt ∈ {16; }
At next statement:
skein_context.h.bCnt ∈ {32; }</pre>
<p>On the third call:</p>
<pre>Before statement:
skein_context.h.bCnt ∈ {32; }
At next statement:
skein_context.h.bCnt ∈ {16; }</pre>
<p>On the fourth call:</p>
<pre>Before statement:
skein_context.h.bCnt ∈ {16; }
At next statement:
skein_context.h.bCnt ∈ {32; }</pre>
<p>On the fifth and last call before entering the loop:</p>
<pre>Before statement:
skein_context.h.bCnt ∈ {32; }
At next statement:
skein_context.h.bCnt ∈ {16; 32; }</pre>
<p>Do not let yourself be distracted by the last "At next statement:" information: this one is imprecise because the "next statement" is inside the <code>while</code> loop. Apart from that it is rather clear what is happening: each call after the first one changes the <code>bCnt</code> field from 16 to 32 or conversely. Inside the <code>while</code> loop the two values end up amalgamated together. That makes it look like there may be a problem with the <code>memcpy</code> call. To make it clear there is no problem we probably just need to group the calls by two. That leads us for instance to write the following <code>main</code> function to verify the safety of all sequences of an even number of calls to <code>Skein_256_Update</code>.</p>
<pre>void main(void)
{
int i;
u08b_t hash[HASHLEN];
Skein_256_Ctxt_t skein_context;
Skein_256_Init(&amp;skein_context HASHLEN * 8);
arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
while (Frama_C_interval(0 1))
{
arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
arbitrarize_msg();
Skein_256_Update(&amp;skein_context msg 80);
}
Skein_256_Final( &amp;skein_context hash);
}
</pre>
<p>With the command-line <code>frama-c -val -slevel 100 -slevel-function main:0 *.c</code> the value analysis does not emit any alarm for this <code>main()</code>. The case of one call was already handled in the tutorial chapter of the value analysis documentation. The separate cases of 0 calls and of an odd number of calls larger than three are left to the reader and this time there will not be any further refinement of this solution in a future blog post. Ask on the <a href="http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/frama-c-discuss" hreflang="en">mailing list</a> if you encounter a problem finishing the verification from here.</p>
<p>In the next episode an alternative and more generalizable solution to the same problem. Or maybe a meta-discussion about the verification methodology being sketched out in this thread.</p>
{% endraw %}
......@@ -50,48 +50,5 @@ while(Frama_C_interval(0 1))
<p>Using local variables as in the above C code snippet the possibility of such a breach can be eliminated: we cannot prevent function <code>Update</code> to keep trace of a previous input buffer in its state but if it then accessed it or even just compared it to another address this would be detected as an operation on a dangling pointer and an alarm would be emitted.</p>
<blockquote><p>Digression: actually we can also prevent function <code>Update</code> to keep trace of a previous input buffer thanks to secondary analyses derived from the value analysis. One of them computes the set of locations written to by a C function. With this analysis we can list the entire set of memory locations that <code>Update</code> uses as state and check that only information that should be kept is. <a href="/derived/analysis/skein/value/2011/01/10/Value-analysis-assisted-verification-of-output-variables-and-information-flow">Here is a future blog post on this topic</a>.</p>
</blockquote>
<p>With the future Frama-C release the verification takes about 5h and 150MiB of memory on a computer that was near top-of-the-line in 2006 (Intel Xeon 5150). Since we are speaking of performance this analysis is single-threaded. With the memory footprint that it now has there is hope that it will get a bit faster in time as a larger and larger proportion of the working set fits in lower and lower levels of memory cache. A parallel value analysis applicable to some verification plans (but untried for this one) is also at the proof-of-concept stage. Where applicable this parallel analysis will allow to take advantage of multicore processors and computation farms.</p>
<p>This post is in two parts, both of them independently good fits for the title, and still not completely without relation to each other, but that's probably a coincidence.</p>
<h2>Methodology</h2>
<p>In this thread, we aim at the verification of low-level properties for the functions in Skein. In the last post, I closed one line of reasoning as if the objective had been reached. But someone might object. \How do you know you can trust the verifier?" ey might say. "What have you really proved?". And "are you asking me to believe that all cases are covered with separate analyses for 0 1 even numbers greater than 2 and odd numbers greater than 3?".</p>
<p>Answers in reverse order:</p>
<ol>
<li>yes. In C you cannot call a function a negative number of times so I am asking you to believe that this partition covers the "calling Update(... 80) an arbitrary number of times" part of the informal specification we settled on <a href="/skein/value/2010/09/30/Progress-of-the-value-analysis-tutorial2" hreflang="en">from the beginning</a>.</li>
<li>The middle question refers to the fact that there is a translation of sorts from the informal requirement or specification to the formal one. This is not unlike the translation in traditional verification from the informal requirement to executable test (or code review) plans. Yes there are dangers here but the dangers are not new. Both with testing or with more formal verification as we have been outlining you can make unjustified assumptions or forget a critical case. That weakness does not prevent replacing some tests and code reviews with formal methods since all these techniques have the weakness. If anything formal specifications that are slightly more expressive than individual test cases or review objectives are slightly closer to informal speech and could be argued to have a slightly lower and therefore slightly less risky translation step.</li>
<li>The first question is even less original than the middle one. We gain confidence in the verifier with tests and code reviews. They didn't suddenly stop working you know.</li>
</ol>
<h2>Jumping ahead to the conclusion of the Skein-256 verification</h2>
<p>All the techniques to deploy have not been exposed yet in this blog but now that the thread gives a good idea of how absence of run-time errors can be verified in very general circumstances using the value analysis I thought I would hint at the final results. One of the techniques to deploy is (as of November 2010) the next Frama-C release which contains many optimizations and performance improvements so there is no hurry for me to go into the details (yet).</p>
<p>Following roughly the same method we have verified in-house that there wasn't any alarm when calling <code>Skein_256_Update</code> an arbitrary number of times between <code>Skein_256_Init</code> and <code>Skein_256_Final</code> using for each call to <code>Skein_256_Update</code> a value <code>n</code> for the length and an input buffer of length exactly <code>n</code> with for each call an arbitrary independent <code>n</code> between 1 and 128.</p>
<p>The question of what has been proved is even more acute than previously. The above informal specification could have several interpretations. Let us just say that one part of the <code>main</code> function used for the verification looks like this:</p>
<pre>...
while(Frama_C_interval(0 1))
{
{
int c = Frama_C_interval(0 1);
if (c)
{
u08b_t msg[1];
arbitrarize_msg(msg 1);
Skein_256_Update( &amp;skein_context msg 1);
continue;