Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
---
layout: post
author: Pascal Cuoq
date: 2011-01-31 10:12 +0200
categories: OCaml memcpy value value-builtins
format: xhtml
title: "On memcpy (part 2: the OCaml source code approach)"
summary:
---
{% raw %}
<p>When picking up the title for <a href="/index.php?post/2011/01/27/On-memcpy-1" hreflang="en">the previous post on function memcpy</a> I anticipated this second part would describe the <code>Frama_C_memcpy</code> built-in function. The subtitle "part 1: the source code approach" seemed a good idea since the first part was about using C source code to tell the analyzer what this function <code>memcpy</code> it often encounters is.</p>
<p>I did not think further and did not realize that I would spend this sequel post explaining that a built-in function is <strong>an OCaml function that directly transforms an abstract state into another abstract state</strong> without going through the tedious instruction-by-instruction interpretation of C code. So this second post is still about source code.</p>
<p>It is not source code that it is recommended or even documented how to write yourself. But it is a convenient way for value analysis experts to introduce certain new features.</p>
<p><strong>What new features?</strong> Smoking hot blazing fast new features.</p>
<p>Also features that work completely right neither the first nor the second time. The <strong>Formula One</strong> of features if you will. In fact you could be forgiven for calling them Virgin Racing features although I prefer to think of them as Ferrari features.</p>
<p>As just stated builtins are about using OCaml functions to describe the effect of an unavailable C function. For instance calls to <code>Frama_C_memcpy(d s l)</code> are interpreted with an OCaml function that in a single step copies a slice of memory of length <code>l*8</code> bits from <code>s</code> to <code>d</code>. And it is fast. Behold:</p>
<pre>char s[10000] d[10000];
main()
{
// Make problem representative
for (int i=0; i<100; i++)
s[100*i] = i;
// Time one or the other
// c_memcpy(d s 10000);
Frama_C_memcpy(d s 10000);
// Check that d contains the right thing
Frama_C_dump_each();
return 0;
}
</pre>
<p><a href="/assets/img/blog/imported-posts/memcpytest.c">(download the entire program)</a></p>
<p>Using the C version of <code>memcpy</code> and sufficient unrolling you get the expected contents for <code>d</code>:</p>
<pre>$ time bin/toplevel.opt -val -slevel 20000 memcpytest.c -no-results
...
d[0..99] ∈ {0; }
[100] ∈ {1; }
[101..199] ∈ {0; }
[200] ∈ {2; }
...
user 0m23.155s
</pre>
<p>Using the <code>Frama_C_memcpy</code> built-in you get the same precise result for <code>d</code>:</p>
<pre>$ time bin/toplevel.opt -val -slevel 20000 memcpytest.c -no-results
...
d[0..99] ∈ {0; }
[100] ∈ {1; }
[101..199] ∈ {0; }
[200] ∈ {2; }
...
user 0m0.402s
</pre>
<p>You also get nearly 23 seconds of your life back.</p>
<p>So what are the things that can go wrong with a builtin replacement function?</p>
<ol>
<li>If the C code that is not analyzed would have caused an alarm to be emitted the built-in should emit that alarm too or at the very least be provided with a header file that offers an ACSL pre-condition along with a prototype for the builtin. Both alternatives work but it is easy to forget to do either.</li>
<li>All cases of imprecise arguments should be handled. Support for an imprecise length was added recently. Very imprecise source or destination pointers may still cause the analysis to abort (it is being done and I am not sure where we were the last time we were working on this).</li>
<li>The builtin function calls the value analysis' internal functions in contexts that differ from the usual well tested circumstances occurring during the analysis of normal C programs. Strange behaviors or bugs may be uncovered by the user. As an example I was just preparing the next section of this post where I would have shown how imprecise lengths were now handled. I was going to suggest modifying the above test program thus:</li>
</ol>
<pre> Frama_C_memcpy(d s Frama_C_nondet(150 10000));
</pre>
<p>This brought up the following two remarks: first there is an internal function for moving slices of the abstract state around that is slower than it should be and it shows on this test. Secondly the value analysis in Carbon will tell you that bits 1200 to 79999 of <code>d</code> contain either <code>0</code> or a very large number a number with about 23400 decimal digits that start with <code>2164197</code> and end with <code>85824</code>. The value analysis is right of course. If the length argument to <code>Frama_C_memcpy</code> was <code>150</code> then that slice of memory contains zero. If the length was <code>10000</code> the numbers in <code>d</code> that were copied from <code>s</code> can be interpreted as representing exactly that 78800-bit number. This is the value analysis' way of telling you that it's either one or the other: either the length was <code>150</code> or it was <code>10000</code>. This is more informative than telling you cell by cell that the cell contains zero or another number because then you wouldn't see that it's either all one or all the other. The value analysis is rather cute that way.</p>
<blockquote><p>Note: if the 78800-bit number explanation seems terribly complicated do not worry there will be a better explanation for this issue in a future post. We just need another new feature in order to be able to show you.</p>
</blockquote>
<p>In conclusion not everyone may want a Formula One car. But <code>Frama_C_memcpy</code> should at least work for completely unrolled deterministic programs where it can speed up analysis quite a bit compared to the interpretation of a C memcpy.</p>
<p>Thanks to Julien for help with motorsports metaphors and to Boris for help in improving <code>Frama_C_memcpy</code> (it used to be worse).</p>
{% endraw %}