2011-09-12-Better-is-the-enemy-of-good...-sometimes.html

---
layout: post
author: Pascal Cuoq
date: 2011-09-12 05:38 +0200
categories: floating-point value value-builtins
format: xhtml
title: "Better is the enemy of good... sometimes"
summary: 
---
{% raw %}
<p>This post is about widening. This technique was shown in the second part of 
a <a href="/index.php?post/2011/01/27/On-memcpy-1">previous post about memcpy()</a>  
where it was laboriously used to analyze imprecisely 
function <code>memcpy()</code> as it is usually written. 
The value analysis in Frama-C has the ability to summarize loops 
in less time than they would take to execute. Indeed  it can 
analyze in finite time loops for which it is not clear whether 
they terminate at all. This is the result of widening  the 
voluntary loss of precision in loops. 
As a side-effect  widening can produce some 
counter-intuitive results. This post is about these 
counter-intuitive results.</p> 
<h2>The expected: better information in  better information out</h2> 
<p>Users generally expect that 
the more you know about the memory state before 
the interpretation of a piece of program  the more you should 
know about the memory state after this piece of program. 
Users are right to expect this  and indeed  it is generally true:</p> 
<pre>int u; 
main(){ 
  u = Frama_C_interval(20  80); 
  if (u &amp; 1) 
    u = 3 * u + 1; // odd 
  else 
    u = u / 2; // even 
} 
</pre> 
<p>The command <code>frama-c -val c.c share/builtin.c</code> shows the final value of <code>u</code> to be in <code>[10..241]</code>. 
If the first line of <code>main()</code> is changed to make the input value range 
over <code>[40..60]</code> instead of <code>[20..80]</code>  then the output 
interval for <code>u</code> becomes <code>[20..181]</code>. This seems normal : 
you know more about variable <code>u</code> going in (all the values in <code>[40..60]</code> 
are included in <code>[20..80]</code>)  so you know more about <code>u</code> 
coming out (<code>[20..181]</code> is included in <code>[10..241]</code>). 
Most of the analyzer's weaknesses  such as  in this case  the inability 
to recognize that the largest value that can get multiplied by 3 is 
respectively 79 or 59  do not compromise this general principle. But widening 
does.</p> 
<h2>Better information in  worse information out</h2> 
<p>Widening happens when there is a loop in the program  such as in 
the example below:</p> 
<pre>int u  i; 
main(){   
  u = Frama_C_interval(40  60); 
  for (i=0; i&lt;10; i++) 
    u = (25 + 3 * u) / 4; 
} 
</pre> 
<pre>$ frama-c -val loop.c share/builtin.c 
... 
[value] Values for function main: 
          u ∈ [15..60] 
</pre> 
<p>Analyzing this program with the 
default options produce the interval <code>[15..60]</code> for <code>u</code>. 
Now  if we change the initial set for <code>u</code> to 
be <code>[25..60]</code>  an interval that contains all the values 
from <code>[40..60]</code> and more:</p> 
<pre>$ frama-c -val loop.c share/builtin.c 
... 
[value] Values for function main: 
          u ∈ [25..60] 
</pre> 
<p>By using a larger  less precise set as the input of this analysis  
we obtain a more precise result. This is counter-intuitive  
although it is rarely noticed in practice.</p> 
<h2>Better modelization  worse analysis</h2> 
<p>There are several ways to make this unpleasant situation less likely to 
be noticed. Only a few of these are implemented in Frama-C's 
value analysis. Still  it was difficult enough to find a short 
example to illustrate my next point  and the program is 
only as short as I was able to make it. Consider the following 
program:</p> 
<pre>int i  j; 
double ftmp1; 
double Frama_C_cos(double); 
double Frama_C_cos_precise(double); 
#define TH 0.196349540849361875 
#define NBC1 14 
#define S 0.35355339 
double cos(double x) 
{ 
  return Frama_C_cos(x); 
} 
main(){   
  for (i = 0; i &lt; 8; i++) 
    for (j = 0; j &lt; 8; j++) 
      { 
        ftmp1 = ((j == 0) ? S : 0.5) * 
          cos ((2.0 * i + 1.0) * j * TH); 
	ftmp1 *= (1 &lt;&lt; NBC1); 
      } 
} 
</pre> 
<p>If you analyze the above program with a smartish 
modelization of function <code>cos()</code>  one that recognizes 
when its input is an interval <code>[l..u]</code> of floating-point 
values on which <code>cos()</code> is monotonous  and computes 
the optimal output in this case  you get 
the following results:</p> 
<pre>$ frama-c -val idct.c share/builtin.c 
... 
          ftmp1 ∈ [-3.40282346639e+38 .. 8192.] 
</pre> 
<p>Replacing the <code>cos()</code> modelization by 
a more naive one that returns <code>[-1. .. 1.]</code> 
as soon as its input set isn't a singleton  you get 
the improved result:</p> 
<pre>$ frama-c -val idct.c share/builtin.c 
... 
          ftmp1 ∈ [-8192. .. 8192.] 
</pre> 
<p>This is slightly different from the <code>[25..60]</code> 
example  in that here we are not changing the input 
set but the modelization of one of the functions involved 
in the program. Still  the causes are the same  and the 
effects just as puzzling when they are noticeable. 
And this is why  until version Carbon  only 
the naive modelization of <code>cos()</code> was 
provided.</p> 
<p>In the next release  there will be  in addition to the naive versions  <code>Frama_C_cos_precise()</code> and 
<code>Frama_C_sin_precise()</code> built-ins for when you want them. Typically  this will be for the more precise analyses that do not involve widening.</p>
 <p>This post is about widening. This technique was shown in the second part of 
a <a href="/index.php?post/2011/01/27/On-memcpy-1">previous post about memcpy()</a>  
where it was laboriously used to analyze imprecisely 
function <code>memcpy()</code> as it is usually written. 
The value analysis in Frama-C has the ability to summarize loops 
in less time than they would take to execute. Indeed  it can 
analyze in finite time loops for which it is not clear whether 
they terminate at all. This is the result of widening  the 
voluntary loss of precision in loops. 
As a side-effect  widening can produce some 
counter-intuitive results. This post is about these 
counter-intuitive results.</p> 
<h2>The expected: better information in  better information out</h2> 
<p>Users generally expect that 
the more you know about the memory state before 
the interpretation of a piece of program  the more you should 
know about the memory state after this piece of program. 
Users are right to expect this  and indeed  it is generally true:</p> 
<pre>int u; 
main(){ 
  u = Frama_C_interval(20  80); 
  if (u &amp; 1) 
    u = 3 * u + 1; // odd 
  else 
    u = u / 2; // even 
} 
</pre> 
<p>The command <code>frama-c -val c.c share/builtin.c</code> shows the final value of <code>u</code> to be in <code>[10..241]</code>. 
If the first line of <code>main()</code> is changed to make the input value range 
over <code>[40..60]</code> instead of <code>[20..80]</code>  then the output 
interval for <code>u</code> becomes <code>[20..181]</code>. This seems normal : 
you know more about variable <code>u</code> going in (all the values in <code>[40..60]</code> 
are included in <code>[20..80]</code>)  so you know more about <code>u</code> 
coming out (<code>[20..181]</code> is included in <code>[10..241]</code>). 
Most of the analyzer's weaknesses  such as  in this case  the inability 
to recognize that the largest value that can get multiplied by 3 is 
respectively 79 or 59  do not compromise this general principle. But widening 
does.</p> 
<h2>Better information in  worse information out</h2> 
<p>Widening happens when there is a loop in the program  such as in 
the example below:</p> 
<pre>int u  i; 
main(){   
  u = Frama_C_interval(40  60); 
  for (i=0; i&lt;10; i++) 
    u = (25 + 3 * u) / 4; 
} 
</pre> 
<pre>$ frama-c -val loop.c share/builtin.c 
... 
[value] Values for function main: 
          u ∈ [15..60] 
</pre> 
<p>Analyzing this program with the 
default options produce the interval <code>[15..60]</code> for <code>u</code>. 
Now  if we change the initial set for <code>u</code> to 
be <code>[25..60]</code>  an interval that contains all the values 
from <code>[40..60]</code> and more:</p> 
<pre>$ frama-c -val loop.c share/builtin.c 
... 
[value] Values for function main: 
          u ∈ [25..60] 
</pre> 
<p>By using a larger  less precise set as the input of this analysis  
we obtain a more precise result. This is counter-intuitive  
although it is rarely noticed in practice.</p> 
<h2>Better modelization  worse analysis</h2> 
<p>There are several ways to make this unpleasant situation less likely to 
be noticed. Only a few of these are implemented in Frama-C's 
value analysis. Still  it was difficult enough to find a short 
example to illustrate my next point  and the program is 
only as short as I was able to make it. Consider the following 
program:</p> 
<pre>int i  j; 
double ftmp1; 
double Frama_C_cos(double); 
double Frama_C_cos_precise(double); 
#define TH 0.196349540849361875 
#define NBC1 14 
#define S 0.35355339 
double cos(double x) 
{ 
  return Frama_C_cos(x); 
} 
main(){   
  for (i = 0; i &lt; 8; i++) 
    for (j = 0; j &lt; 8; j++) 
      { 
        ftmp1 = ((j == 0) ? S : 0.5) * 
          cos ((2.0 * i + 1.0) * j * TH); 
	ftmp1 *= (1 &lt;&lt; NBC1); 
      } 
} 
</pre> 
<p>If you analyze the above program with a smartish 
modelization of function <code>cos()</code>  one that recognizes 
when its input is an interval <code>[l..u]</code> of floating-point 
values on which <code>cos()</code> is monotonous  and computes 
the optimal output in this case  you get 
the following results:</p> 
<pre>$ frama-c -val idct.c share/builtin.c 
... 
          ftmp1 ∈ [-3.40282346639e+38 .. 8192.] 
</pre> 
<p>Replacing the <code>cos()</code> modelization by 
a more naive one that returns <code>[-1. .. 1.]</code> 
as soon as its input set isn't a singleton  you get 
the improved result:</p> 
<pre>$ frama-c -val idct.c share/builtin.c 
... 
          ftmp1 ∈ [-8192. .. 8192.] 
</pre> 
<p>This is slightly different from the <code>[25..60]</code> 
example  in that here we are not changing the input 
set but the modelization of one of the functions involved 
in the program. Still  the causes are the same  and the 
effects just as puzzling when they are noticeable. 
And this is why  until version Carbon  only 
the naive modelization of <code>cos()</code> was 
provided.</p> 
<p>In the next release  there will be  in addition to the naive versions  <code>Frama_C_cos_precise()</code> and 
<code>Frama_C_sin_precise()</code> built-ins for when you want them. Typically  this will be for the more precise analyses that do not involve widening.</p>
{% endraw %}