Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
---
layout: post
author: Pascal Cuoq
date: 2011-01-27 13:58 +0200
categories: memcpy value
format: xhtml
title: "On memcpy (part 1: the source code approach)"
summary:
---
{% raw %}
<p><code>memcpy()</code> is one of the few functions standardized as part of C itself instead of an additional API. But that's not what makes it interesting from a static analysis point of view. I think what makes it interesting is that it is used often, and often for tasks that can be described in high-level terms, whereas its implementations range <strong>from the low-level to the horribly low-level</strong>.</p>
<p>Here is a typical <code>memcpy</code> implementation:</p>
<pre>void *
memcpy(void *dst, void *src, size_t length)
{
int i;
unsigned char *dp = dst;
unsigned char *sp = src;
for (i=0; i<length; i++)
*dp++ = *sp++;
return dst;
}
</pre>
<p>And here are snippets of an analysis context that uses it (download <a href="/assets/img/blog/imported-posts/m.c">the entire file</a>):</p>
<pre>struct S { int *p; int *q; int i; double d; int j; };
...
struct S s d;
s.p = &a;
s.q = Frama_C_nondet_ptr(&a &b);
s.i = Frama_C_interval(17 42);
s.j = 456;
s.d = Frama_C_double_interval(3.0 7.0);
memcpy(&d &s sizeof(struct S));
</pre>
<p>Analyzing with <code>frama-c -val .../builtin.c m.c</code> produces the following information about the contents of <code>s</code> which is exactly what we could have expected:</p>
<pre> s.p ∈ {{ &a ;}}
.q ∈ {{ &a ; &b ;}}
.i ∈ [17..42]
.d ∈ [3. .. 7.]
.j ∈ {456; }
</pre>
<p>On the other hand the information for <code>d</code> is much less precise:</p>
<pre> d ∈ {{ garbled mix of &{a; b; } (origin: Misaligned {m.c:11; }) }} or UNINITIALIZED
</pre>
<p>Each field of <code>d</code> is only guaranteed to contain a value that was not computed from base addresses other than <code>&a</code> and <code>&b</code>. The fields of <code>d</code> are not even guaranteed to have been initialized. An ideally precise analysis would tell us that <code>d</code> contains the same values as those displayed for <code>s</code>.</p>
<p>Side remark: if you do run the analysis at home you can ignore the messages about imprecise values for the moment. These are not alarms to act upon but simply informational messages to help trace when the value analysis is imprecise on large programs where the imprecision starts.</p>
<p>If you have been reading this blog from the beginning perhaps as a continuation from the <a href="http://frama-c.com/download/frama-c-value-analysis.pdf" hreflang="en">tutorial</a> you may have noticed by now that whenever the value analysis is not precise enough the first thing to try is option <code>-slevel</code>.</p>
<p><code>frama-c -val .../builtin.c m.c -slevel 100</code></p>
<pre> s.p ∈ {{ &a ;}}
.q ∈ {{ &a ; &b ;}}
.i ∈ [17..42]
.d ∈ [3. .. 7.]
.j ∈ {456; }
d.p ∈ {{ &a ;}}
.q[bits 0 to 7]# ∈ {{ &a ; &b ;}} misaligned 0%32
.q[bits 8 to 15]# ∈ {{ &a ; &b ;}} misaligned 0%32
.q[bits 16 to 23]# ∈ {{ &a ; &b ;}} misaligned 0%32
.q[bits 24 to 31]# ∈ {{ &a ; &b ;}} misaligned 0%32
.i[bits 0 to 7]# ∈ [17..42] misaligned 0%32
.i[bits 8 to 15]# ∈ [17..42] misaligned 0%32
.i[bits 16 to 23]# ∈ [17..42] misaligned 0%32
.i[bits 24 to 31]# ∈ [17..42] misaligned 0%32
.d[bits 0 to 7]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 8 to 15]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 16 to 23]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 24 to 31]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 32 to 39]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 40 to 47]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 48 to 55]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 56 to 63]# ∈ [3. .. 7.] misaligned 32%64
.j ∈ {456; }
</pre>
<p>Good news: all fields of <code>d</code> are now guaranteed to be initialized.
In addition the information available for each field is more precise than before.
The information about fields <code>.j</code> and <code>.p</code> has been perfectly preserved during the <code>memcpy</code> from <code>s</code> to <code>d</code> whereas fields <code>.q</code> <code>.i</code> and <code>.d</code> are only known to contain several 8-bit slices of values. This is what the "misaligned ..." part of each line mean.</p>
<p>For instance line ".i[bits 0 to 7]# ∈ [17..42] misaligned 0%32" uses this mention in order to avoid giving the impression that bits 0 to 7 of field <code>.i</code> contain [17..42]. Instead there was a 32-bit value [17..42] somewhere in memory and a 8-bit slice of that value was copied to the first 8 bits or <code>.i</code>.</p>
<p>The fact that all 4 (respectively 8) slices of each field are followed by the same "misaligned" mention with the same numerical values mean that all slices for that field <strong>could</strong> come from the same value and have been copied in order with the first source slice going to the first destination slice ... However the value analysis is not able to guarantee that this is what happened. It has lost the information that bits 0 to 7 and bits 8 to 15 of <code>.i</code> come from the same value which is why the contents of <code>.i</code> are printed thus instead of just displaying that <code>.i</code> is [17..42].</p>
<p>It is important not to stitch together say bit 0 to 7 of <code>{{ &a ; &b ;}} misaligned 0%32</code> and bit 8 to 31 of <code>{{ &a ; &b ;}} misaligned 0%32</code> into <code>{{ &a ; &b ;}}</code> when you do not know that they come from the same value. Consider the example below:</p>
<pre> int a b;
int *p = Frama_C_nondet_ptr(&a &b);
int *q = Frama_C_nondet_ptr(&a &b);
*(char*)&p = *(char*)&q;
</pre>
<p>After analyzing this piece of code the value analysis predicts for <code>p</code>:</p>
<pre> p[bits 0 to 7]# ∈ {{ &a ; &b ;}} misaligned 0%32
[bits 8 to 31]# ∈ {{ &a ; &b ;}} misaligned 0%32
</pre>
<p>It would be incorrect to predict that <code>p</code> contains {{ &a ; &b ;}} here because the first call to <code>Frama_C_nondet_ptr</code> may return <code>&a</code> while the second one returns <code>&b</code>. Without the information that the two slices originate from the same value it is incorrect to stitch them together as if they did.</p>
<p>The value analysis does better when there is only one possible concrete value inside the set at hand. Fields <code>.p</code> and <code>.j</code> in the original program although they were copied char by char could be reconstituted because the value analysis knows that there is only one possible way to be {{ &a ; }} (or { 456; }) which implies that properly ordered slices when put side by side can only rebuild &a.</p>
<p>This <code>memcpy</code> if it is called with large values of <code>length</code> in the program and the program is analyzed with the <code>-slevel</code> option may occupy a large fraction of the analysis time. This is because although the analyzer does not keep the information that the small bites of values come from the same value which it could use it keeps the information that there exist plenty of equalities between source and destination chars — and does not use them. This should be considered a known issue with an open-ended fix date.</p>
<p>And this performance issue naturally brings up another performance issue with which it is suitable to conclude this first part of the <code>memcpy</code> discussion. At the other end of the spectrum what if the user finds the analysis of <code>memcpy</code> <strong>too slow</strong> even without <code>-slevel</code> (or with <code>-slevel-function memcpy:0</code> to force a rough analysis of <code>memcpy</code>)?</p>
<p>To this user we say: replace your <code>memcpy</code> function by the one below.</p>
<pre>void *
memcpy(void *dst void *src size_t length)
{
unsigned char *dp = dst;
unsigned char *sp = src;
for (;Frama_C_interval(0 1);)
{
dp[Frama_C_interval(0 length-1)] = sp[Frama_C_interval(0 length-1)];
}
return dst;
}
</pre>
<p>While it it may not be immediately clear why one would want to replace the original executable <code>memcpy</code> with this non-executable one we can at least notice that all the executions of the original <code>memcpy</code> are captured by this one. At each iteration of the original <code>memcpy</code> the condition is either 0 or 1 and one of the characters <code>sp[Frama_C_interval(0 length-1)]</code> is copied to <code>dp[Frama_C_interval(0 length-1)]</code>.</p>
<p>Now to investigate why analyzing this function is faster let us look at the states propagated by the analyzer when analyzing the <code>for</code> loop of the original <code>memcpy</code>. It starts the loop with:</p>
<pre> i ∈ {0; }
dp ∈ {{ &d ;}}
sp ∈ {{ &s ;}}
d completely uninitialized
</pre>
<p>Here is the sequence of states at the point after another char has been copied before <code>dp</code> <code>sp</code> and <code>i</code> are incremented:</p>
<pre> i ∈ {0; }
dp ∈ {{ &d ;}}
sp ∈ {{ &s ;}}
d.p[bits 0 to 7]# ∈ {{ &a ;}} misaligned 0%32
{.p[bits 8 to 31]; .q; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ {0; 1; }
dp ∈ {{ &d + {0; 1; } ;}}
sp ∈ {{ &s + {0; 1; } ;}}
d.p[bits 0 to 15] ∈
{{ garbled mix of &{a; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.p[bits 16 to 31]; .q; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ {0; 1; 2; }
dp ∈ {{ &d + {0; 1; 2; } ;}}
sp ∈ {{ &s + {0; 1; 2; } ;}}
d.p[bits 0 to 23] ∈
{{ garbled mix of &{a; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.p[bits 24 to 31]; .q; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ {0; 1; 2; 3; }
dp ∈ {{ &d + {0; 1; 2; 3; } ;}}
sp ∈ {{ &s + {0; 1; 2; 3; } ;}}
d.p ∈
{{ garbled mix of &{a; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.q; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ {0; 1; 2; 3; 4; }
dp ∈ {{ &d + {0; 1; 2; 3; 4; } ;}}
sp ∈ {{ &s + {0; 1; 2; 3; 4; } ;}}
d{.p; .q[bits 0 to 7]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.q[bits 8 to 31]; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ [0..15]
dp ∈ {{ &d + {0; 1; 2; 3; 4; 5; } ;}}
sp ∈ {{ &s + {0; 1; 2; 3; 4; 5; } ;}}
d{.p; .q[bits 0 to 15]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.q[bits 16 to 31]; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ [0..16]
dp ∈ {{ &d + {0; 1; 2; 3; 4; 5; 6; } ;}}
sp ∈ {{ &s + {0; 1; 2; 3; 4; 5; 6; } ;}}
d{.p; .q[bits 0 to 23]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.q[bits 24 to 31]; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ [0..23]
dp ∈ {{ &d + [0..7] ;}}
sp ∈ {{ &s + [0..7] ;}}
d{.p; .q; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.i; .d; .j; } ∈ UNINITIALIZED
i ∈ [0..23]
dp ∈ {{ &d + [0..8] ;}}
sp ∈ {{ &s + [0..8] ;}}
d{.p; .q; .i[bits 0 to 7]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.i[bits 8 to 31]; .d; .j; } ∈ UNINITIALIZED
i ∈ [0..23]
dp ∈ {{ &d + [0..15] ;}}
sp ∈ {{ &s + [0..15] ;}}
d{.p; .q; .i; .d[bits 0 to 31]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.d[bits 32 to 63]; .j; } ∈ UNINITIALIZED
i ∈ [0..23]
dp ∈ {{ &d + [0..16] ;}}
sp ∈ {{ &s + [0..16] ;}}
d{.p; .q; .i; .d[bits 0 to 39]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.d[bits 40 to 63]; .j; } ∈ UNINITIALIZED
i ∈ [0..23]
dp ∈ {{ &d + [0..23] ;}}
sp ∈ {{ &s + [0..23] ;}}
d ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
</pre>
<p>In contrast with this long sequence of abstract states going through the same point of the <code>for</code> loop until a fixpoint is finally found the analysis of the modified <code>memcpy</code> reaches the same abstract state in one iteration realizes that it is a fixpoint and goes on with its life:</p>
<pre> dp ∈ {{ &d ;}}
sp ∈ {{ &s ;}}
d ∈
{{ garbled mix of &{a; b; } (origin: Merge {mfast.c:32; }) }} or UNINITIALIZED
</pre>
<p>In short the modified <code>memcpy</code> voluntarily loses information so that finding the loop's fixpoint becomes straightforward. The analysis of the original <code>memcpy</code> has to discover the fixpoint by gradually losing information and seeing what values stick. In fact the process of discovering the fixpoint may overshoot and end up <strong>less</strong> precise as well as slower than the improved <code>memcpy</code>. For imprecise analyses of programs that use <code>memcpy</code> when the user is interested in the verification of the program itself rather than <code>memcpy</code> it is recommended to use the replacement version.</p>
<p><code>memcpy()</code> is one of the few functions standardized as part of C itself instead of an additional API. But that's not what makes it interesting from a static analysis point of view. I think what makes it interesting is that it is used often, and often for tasks that can be described in high-level terms, whereas its implementations range <strong>from the low-level to the horribly low-level</strong>.</p>
<p>Here is a typical <code>memcpy</code> implementation:</p>
<pre>void *
memcpy(void *dst, void *src, size_t length)
{
int i;
unsigned char *dp = dst;
unsigned char *sp = src;
for (i=0; i<length; i++)
*dp++ = *sp++;
return dst;
}
</pre>
<p>And here are snippets of an analysis context that uses it (download <a href="/assets/img/blog/imported-posts/m.c">the entire file</a>):</p>
<pre>struct S { int *p; int *q; int i; double d; int j; };
...
struct S s d;
s.p = &a;
s.q = Frama_C_nondet_ptr(&a &b);
s.i = Frama_C_interval(17 42);
s.j = 456;
s.d = Frama_C_double_interval(3.0 7.0);
memcpy(&d &s sizeof(struct S));
</pre>
<p>Analyzing with <code>frama-c -val .../builtin.c m.c</code> produces the following information about the contents of <code>s</code> which is exactly what we could have expected:</p>
<pre> s.p ∈ {{ &a ;}}
.q ∈ {{ &a ; &b ;}}
.i ∈ [17..42]
.d ∈ [3. .. 7.]
.j ∈ {456; }
</pre>
<p>On the other hand the information for <code>d</code> is much less precise:</p>
<pre> d ∈ {{ garbled mix of &{a; b; } (origin: Misaligned {m.c:11; }) }} or UNINITIALIZED
</pre>
<p>Each field of <code>d</code> is only guaranteed to contain a value that was not computed from base addresses other than <code>&a</code> and <code>&b</code>. The fields of <code>d</code> are not even guaranteed to have been initialized. An ideally precise analysis would tell us that <code>d</code> contains the same values as those displayed for <code>s</code>.</p>
<p>Side remark: if you do run the analysis at home you can ignore the messages about imprecise values for the moment. These are not alarms to act upon but simply informational messages to help trace when the value analysis is imprecise on large programs where the imprecision starts.</p>
<p>If you have been reading this blog from the beginning perhaps as a continuation from the <a href="http://frama-c.com/download/frama-c-value-analysis.pdf" hreflang="en">tutorial</a> you may have noticed by now that whenever the value analysis is not precise enough the first thing to try is option <code>-slevel</code>.</p>
<p><code>frama-c -val .../builtin.c m.c -slevel 100</code></p>
<pre> s.p ∈ {{ &a ;}}
.q ∈ {{ &a ; &b ;}}
.i ∈ [17..42]
.d ∈ [3. .. 7.]
.j ∈ {456; }
d.p ∈ {{ &a ;}}
.q[bits 0 to 7]# ∈ {{ &a ; &b ;}} misaligned 0%32
.q[bits 8 to 15]# ∈ {{ &a ; &b ;}} misaligned 0%32
.q[bits 16 to 23]# ∈ {{ &a ; &b ;}} misaligned 0%32
.q[bits 24 to 31]# ∈ {{ &a ; &b ;}} misaligned 0%32
.i[bits 0 to 7]# ∈ [17..42] misaligned 0%32
.i[bits 8 to 15]# ∈ [17..42] misaligned 0%32
.i[bits 16 to 23]# ∈ [17..42] misaligned 0%32
.i[bits 24 to 31]# ∈ [17..42] misaligned 0%32
.d[bits 0 to 7]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 8 to 15]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 16 to 23]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 24 to 31]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 32 to 39]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 40 to 47]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 48 to 55]# ∈ [3. .. 7.] misaligned 32%64
.d[bits 56 to 63]# ∈ [3. .. 7.] misaligned 32%64
.j ∈ {456; }
</pre>
<p>Good news: all fields of <code>d</code> are now guaranteed to be initialized.
In addition the information available for each field is more precise than before.
The information about fields <code>.j</code> and <code>.p</code> has been perfectly preserved during the <code>memcpy</code> from <code>s</code> to <code>d</code> whereas fields <code>.q</code> <code>.i</code> and <code>.d</code> are only known to contain several 8-bit slices of values. This is what the "misaligned ..." part of each line mean.</p>
<p>For instance line ".i[bits 0 to 7]# ∈ [17..42] misaligned 0%32" uses this mention in order to avoid giving the impression that bits 0 to 7 of field <code>.i</code> contain [17..42]. Instead there was a 32-bit value [17..42] somewhere in memory and a 8-bit slice of that value was copied to the first 8 bits or <code>.i</code>.</p>
<p>The fact that all 4 (respectively 8) slices of each field are followed by the same "misaligned" mention with the same numerical values mean that all slices for that field <strong>could</strong> come from the same value and have been copied in order with the first source slice going to the first destination slice ... However the value analysis is not able to guarantee that this is what happened. It has lost the information that bits 0 to 7 and bits 8 to 15 of <code>.i</code> come from the same value which is why the contents of <code>.i</code> are printed thus instead of just displaying that <code>.i</code> is [17..42].</p>
<p>It is important not to stitch together say bit 0 to 7 of <code>{{ &a ; &b ;}} misaligned 0%32</code> and bit 8 to 31 of <code>{{ &a ; &b ;}} misaligned 0%32</code> into <code>{{ &a ; &b ;}}</code> when you do not know that they come from the same value. Consider the example below:</p>
<pre> int a b;
int *p = Frama_C_nondet_ptr(&a &b);
int *q = Frama_C_nondet_ptr(&a &b);
*(char*)&p = *(char*)&q;
</pre>
<p>After analyzing this piece of code the value analysis predicts for <code>p</code>:</p>
<pre> p[bits 0 to 7]# ∈ {{ &a ; &b ;}} misaligned 0%32
[bits 8 to 31]# ∈ {{ &a ; &b ;}} misaligned 0%32
</pre>
<p>It would be incorrect to predict that <code>p</code> contains {{ &a ; &b ;}} here because the first call to <code>Frama_C_nondet_ptr</code> may return <code>&a</code> while the second one returns <code>&b</code>. Without the information that the two slices originate from the same value it is incorrect to stitch them together as if they did.</p>
<p>The value analysis does better when there is only one possible concrete value inside the set at hand. Fields <code>.p</code> and <code>.j</code> in the original program although they were copied char by char could be reconstituted because the value analysis knows that there is only one possible way to be {{ &a ; }} (or { 456; }) which implies that properly ordered slices when put side by side can only rebuild &a.</p>
<p>This <code>memcpy</code> if it is called with large values of <code>length</code> in the program and the program is analyzed with the <code>-slevel</code> option may occupy a large fraction of the analysis time. This is because although the analyzer does not keep the information that the small bites of values come from the same value which it could use it keeps the information that there exist plenty of equalities between source and destination chars — and does not use them. This should be considered a known issue with an open-ended fix date.</p>
<p>And this performance issue naturally brings up another performance issue with which it is suitable to conclude this first part of the <code>memcpy</code> discussion. At the other end of the spectrum what if the user finds the analysis of <code>memcpy</code> <strong>too slow</strong> even without <code>-slevel</code> (or with <code>-slevel-function memcpy:0</code> to force a rough analysis of <code>memcpy</code>)?</p>
<p>To this user we say: replace your <code>memcpy</code> function by the one below.</p>
<pre>void *
memcpy(void *dst void *src size_t length)
{
unsigned char *dp = dst;
unsigned char *sp = src;
for (;Frama_C_interval(0 1);)
{
dp[Frama_C_interval(0 length-1)] = sp[Frama_C_interval(0 length-1)];
}
return dst;
}
</pre>
<p>While it it may not be immediately clear why one would want to replace the original executable <code>memcpy</code> with this non-executable one we can at least notice that all the executions of the original <code>memcpy</code> are captured by this one. At each iteration of the original <code>memcpy</code> the condition is either 0 or 1 and one of the characters <code>sp[Frama_C_interval(0 length-1)]</code> is copied to <code>dp[Frama_C_interval(0 length-1)]</code>.</p>
<p>Now to investigate why analyzing this function is faster let us look at the states propagated by the analyzer when analyzing the <code>for</code> loop of the original <code>memcpy</code>. It starts the loop with:</p>
<pre> i ∈ {0; }
dp ∈ {{ &d ;}}
sp ∈ {{ &s ;}}
d completely uninitialized
</pre>
<p>Here is the sequence of states at the point after another char has been copied before <code>dp</code> <code>sp</code> and <code>i</code> are incremented:</p>
<pre> i ∈ {0; }
dp ∈ {{ &d ;}}
sp ∈ {{ &s ;}}
d.p[bits 0 to 7]# ∈ {{ &a ;}} misaligned 0%32
{.p[bits 8 to 31]; .q; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ {0; 1; }
dp ∈ {{ &d + {0; 1; } ;}}
sp ∈ {{ &s + {0; 1; } ;}}
d.p[bits 0 to 15] ∈
{{ garbled mix of &{a; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.p[bits 16 to 31]; .q; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ {0; 1; 2; }
dp ∈ {{ &d + {0; 1; 2; } ;}}
sp ∈ {{ &s + {0; 1; 2; } ;}}
d.p[bits 0 to 23] ∈
{{ garbled mix of &{a; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.p[bits 24 to 31]; .q; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ {0; 1; 2; 3; }
dp ∈ {{ &d + {0; 1; 2; 3; } ;}}
sp ∈ {{ &s + {0; 1; 2; 3; } ;}}
d.p ∈
{{ garbled mix of &{a; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.q; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ {0; 1; 2; 3; 4; }
dp ∈ {{ &d + {0; 1; 2; 3; 4; } ;}}
sp ∈ {{ &s + {0; 1; 2; 3; 4; } ;}}
d{.p; .q[bits 0 to 7]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.q[bits 8 to 31]; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ [0..15]
dp ∈ {{ &d + {0; 1; 2; 3; 4; 5; } ;}}
sp ∈ {{ &s + {0; 1; 2; 3; 4; 5; } ;}}
d{.p; .q[bits 0 to 15]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.q[bits 16 to 31]; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ [0..16]
dp ∈ {{ &d + {0; 1; 2; 3; 4; 5; 6; } ;}}
sp ∈ {{ &s + {0; 1; 2; 3; 4; 5; 6; } ;}}
d{.p; .q[bits 0 to 23]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.q[bits 24 to 31]; .i; .d; .j; } ∈ UNINITIALIZED
i ∈ [0..23]
dp ∈ {{ &d + [0..7] ;}}
sp ∈ {{ &s + [0..7] ;}}
d{.p; .q; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.i; .d; .j; } ∈ UNINITIALIZED
i ∈ [0..23]
dp ∈ {{ &d + [0..8] ;}}
sp ∈ {{ &s + [0..8] ;}}
d{.p; .q; .i[bits 0 to 7]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.i[bits 8 to 31]; .d; .j; } ∈ UNINITIALIZED
i ∈ [0..23]
dp ∈ {{ &d + [0..15] ;}}
sp ∈ {{ &s + [0..15] ;}}
d{.p; .q; .i; .d[bits 0 to 31]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.d[bits 32 to 63]; .j; } ∈ UNINITIALIZED
i ∈ [0..23]
dp ∈ {{ &d + [0..16] ;}}
sp ∈ {{ &s + [0..16] ;}}
d{.p; .q; .i; .d[bits 0 to 39]; } ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
{.d[bits 40 to 63]; .j; } ∈ UNINITIALIZED
i ∈ [0..23]
dp ∈ {{ &d + [0..23] ;}}
sp ∈ {{ &s + [0..23] ;}}
d ∈
{{ garbled mix of &{a; b; } (origin: Merge {m.c:12; }) }} or UNINITIALIZED
</pre>
<p>In contrast with this long sequence of abstract states going through the same point of the <code>for</code> loop until a fixpoint is finally found the analysis of the modified <code>memcpy</code> reaches the same abstract state in one iteration realizes that it is a fixpoint and goes on with its life:</p>
<pre> dp ∈ {{ &d ;}}
sp ∈ {{ &s ;}}
d ∈
{{ garbled mix of &{a; b; } (origin: Merge {mfast.c:32; }) }} or UNINITIALIZED
</pre>
<p>In short the modified <code>memcpy</code> voluntarily loses information so that finding the loop's fixpoint becomes straightforward. The analysis of the original <code>memcpy</code> has to discover the fixpoint by gradually losing information and seeing what values stick. In fact the process of discovering the fixpoint may overshoot and end up <strong>less</strong> precise as well as slower than the improved <code>memcpy</code>. For imprecise analyses of programs that use <code>memcpy</code> when the user is interested in the verification of the program itself rather than <code>memcpy</code> it is recommended to use the replacement version.</p>
{% endraw %}