Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
layout: post
author: Pascal Cuoq
date: 2011-08-10 08:30 +0200
categories: OCaml csmith
format: xhtml
title: "Csmith testing reveals that I'm no good at probabilities (and lazy)"
summary:
---
{% raw %}
<h2>Csmith testing</h2>
<p>A typical Frama-C Csmith testing script repeats four actions in an infinite loop:</p>
<ol>
<li>getting a random program from Csmith;</li>
<li>compiling and executing it;</li>
<li>analyzing it with Frama-C;</li>
<li>using the results from step 3, possibly together with those of step 2, to determine whether the program reveals something that would interest a Frama-C developer.</li>
</ol>
<p>Step four is the difficult part. I definitely want to hear about varied events that could happen during the analysis, and step four represents about half the testing script's line count. One difficulty is that if a Frama-C bug has already been identified, but isn't fixed yet, I do not want to be warned again about that bug until it is supposed to have been fixed. Sometimes it is possible to use Csmith command-line options to generate only programs that do not trigger the bug. Sometimes, this is impossible, and you have to recognize the bug in step 4.</p>
<p>A particular known bug in Frama-C was characterized by the backtrace below, that I am posting in its entirety to give you a chance to identify my mistake before I tell you about it.</p>
<pre> Raised at file \src/memory_state/new_offsetmap.ml" line 361 characters 26-37
Called from file "src/memory_state/new_offsetmap.ml" line 377 characters 17-64
Called from file "src/memory_state/new_offsetmap.ml" line 1394 characters 15-34
Called from file "src/memory_state/new_offsetmap.ml" line 1436 characters 18-114
Called from file "set.ml" line 293 characters 37-58
Called from file "src/memory_state/new_offsetmap.ml" line 1433 characters 11-399
Called from file "src/memory_state/offsetmap.ml" line 1308 characters 18-93
Called from file "src/memory_state/lmap.ml" line 641 characters 24-260
Called from file "src/memory_state/lmap.ml" line 626 characters 18-1023
Called from file "src/memory_state/cvalue.ml" line 1171 characters 12-50
Called from file "src/value/eval_exprs.ml" line 1399 characters 4-61
Called from file "src/value/eval_exprs.ml" line 1447 characters 4-75
Called from file "src/value/eval_exprs.ml" line 1058 characters 8-66
Called from file "src/value/eval_exprs.ml" line 986 characters 10-62
Called from file "src/value/eval_exprs.ml" line 1056 characters 8-52
Called from file "src/value/eval_exprs.ml" line 1207 characters 4-55
Called from file "src/value/eval_stmts.ml" line 446 characters 6-105
Called from file "src/value/eval_stmts.ml" line 880 characters 54-69
Called from file "list.ml" line 74 characters 24-34
Called from file "src/memory_state/state_set.ml" line 26 characters 2-24
Called from file "src/value/eval_stmts.ml" line 889 characters 14-232
Called from file "cil/src/ext/dataflow.ml" line 320 characters 29-47
Called from file "cil/src/ext/dataflow.ml" line 497 characters 8-21
Called from file "cil/src/ext/dataflow.ml" line 501 characters 9-22
Called from file "src/value/eval_funs.ml" line 100 characters 14-37
Called from file "src/value/eval_funs.ml" line 502 characters 8-63
Called from file "src/value/eval_stmts.ml" line 819 characters 12-71
Called from file "src/value/eval_stmts.ml" line 831 characters 10-114
Called from file "src/value/eval_stmts.ml" line 864 characters 38-62
Called from file "list.ml" line 74 characters 24-34
Called from file "src/value/eval_stmts.ml" line 920 characters 26-79
Called from file "cil/src/ext/dataflow.ml" line 320 characters 29-47
Called from file "cil/src/ext/dataflow.ml" line 497 characters 8-21
Called from file "cil/src/ext/dataflow.ml" line 501 characters 9-22
Called from file "src/value/eval_funs.ml" line 100 characters 14-37
Called from file "src/value/eval_funs.ml" line 502 characters 8-63
Called from file "src/value/eval_stmts.ml" line 819 characters 12-71
Called from file "src/value/eval_stmts.ml" line 831 characters 10-114
Called from file "src/value/eval_stmts.ml" line 864 characters 38-62
Called from file "list.ml" line 74 characters 24-34
Called from file "src/value/eval_stmts.ml" line 920 characters 26-79
Called from file "cil/src/ext/dataflow.ml" line 320 characters 29-47
Called from file "cil/src/ext/dataflow.ml" line 497 characters 8-21
Called from file "cil/src/ext/dataflow.ml" line 501 characters 9-22
Called from file "src/value/eval_funs.ml" line 100 characters 14-37
Called from file "src/value/eval_funs.ml" line 482 characters 4-67
Called from file "src/value/eval_funs.ml" line 564 characters 11-44
Re-raised at file "src/value/eval_funs.ml" line 578 characters 47-50
Called from file "src/project/state_builder.ml" line 1076 characters 9-13
Re-raised at file "src/project/state_builder.ml" line 1080 characters 15-18
Called from file "src/value/register.ml" line 49 characters 4-24
Called from file "queue.ml" line 134 characters 6-20
Called from file "src/kernel/boot.ml" line 36 characters 4-20
Called from file "src/kernel/cmdline.ml" line 723 characters 2-9
Called from file "src/kernel/cmdline.ml" line 200 characters 4-8
Unexpected error (New_offsetmap.Make(V).End_reached).
</pre>
<p>The uncaught exception comes from new_offsetmap:361 in an auxiliary function
but in this particular case
the place whence this auxiliary function is called (line 377 in the same file) is
a good way to identify the bug.
If the auxiliary function raises the same exception when the function is called
from another place it's another bug.
If it's from the same call it's the same bug.</p>
<p>I adapted my shell script to include the following test in step 4:</p>
<pre>if grep "line 377" analysis_log
then true # I do not know how to negate a condition in shell and I do not want to learn
else ... # continue classifying the analysis
</pre>
<p>I wanted to grep for file name and line number but in the backtrace the file name is surrounded
by quotes and I didn't know how to
escape the quotes inside the grep command-line.</p>
<h2>Mergeable Interval Maps</h2>
<p>The file <code>new_offsetmap.ml</code> contains the implementation of the data structure
described in <a href="http://studia.complexica.net/Art/RI090101.pdf">this article</a>. Pages 6 to 33
are in English. Incidentally Dotclear does not seem to offer any way to specify
that a linked document is half in one language and half in another.</p>
<p>Although the new data structure's implementation is far from complete it is useful for handling unions
to occasionally translate from the old datastructure to the new one.
Csmith generates programs where unions can be written through a member
and read from another (the members are only of integral types). The old data
structure can handle most sequences of write combinations having occurred before
an union is read but a few complex combinations are not handled.
Richard Bonichon and I already wrote the code to handle all possible combinations
in the new data structure and it would be a waste of time to backport this work
to the old data structure at this point.</p>
<p>You may have guessed that an exception is raised in this module because it's newer
and full of bugs but in fact what is currently implemented already works quite well (one of
the advantages of having lots of tests and a trusted reference implementation).
The reason the new implementation
detects bugs is that knowing in advance what it was going to be used for we included
lots of assertion checks for conditions we knew weren't supposed to happen. The bug comes
in much earlier during parsing and cannot conveniently be worked around when it is detected
in this module.</p>
<h2>My mistake</h2>
<p>What mistake did I make? The backtrace I have shown is 55 lines long. Each of them
contains the pattern "xxx lines" where the values for "xxx" are not independent but we'll
assume that they are for the sake of this argument. All files are more than 377 lines long
and some of them are more than 3779 lines long. It is not sufficiently improbable
that my script would hide from me another bug with a completely different backtrace
that just happens to contain "line 377" or even "line 3775".</p>
<p>I noticed this a few days ago. Resolute in my decision not to learn how to escape quotes
in shell I decided to make false negatives much less likely
by grepping for "line 377 characters 17-64" instead.</p>
<p>This is when I updated Frama-C from source control. The file <code>new_offsetmap.ml</code>
being work in progress the file had been modified and
the line that characterized the bug changed. So I gave up
and started to experiment with grepping special characters.
It turns out that it's possible to grep for doubles quotes by using the simple
syntax below.</p>
<pre>grep 'Raised at file "src/memory_state/new_offsetmap.ml"' analysis_log
</pre>
<p>What is the moral of this story?
What I refused to learn was very simple; I
eventually replaced an approximate test by another that is just as approximate but in
a different way; if you see other angles please write them in the comments.</p>
<h2>Aside</h2>
<p>Are you thinking "55 frames in the call stack? What a bunch of bloated analysis framework! This thing must be completely useless!"? Then you are <a href="/index.php?post/2011/07/16/Formal-verification-and-SLOC">making judgments based on criteria you are familiar with and ignoring those that are new to you</a>. A more correct reaction would be "I know how to implement a much lighter analyzer that will have all the same functionality and will be just as precise in 5000 lines of my favorite language over a week-end; I'll do that and show them."</p>
<p>Zaynah Dargaye provided useful comments on an early version of this post.</p>
{% endraw %}