Skip to content
Snippets Groups Projects
2010-10-17-Unspecified-behaviors-and-derived-analyses.html 12.6 KiB
Newer Older
---
layout: post
author: Pascal Cuoq
date: 2010-10-17 12:59 +0200
categories: derived-analysis unspecified-behavior value
format: xhtml
title: "Unspecified behaviors and derived analyses"
summary: 
---
{% raw %}
<h2>Prologue</h2> 
<p><strong>The C standard(s) specifies a minimum of things</strong> that all C compilers must agree on. For the sake of efficiency, many syntactically correct constructs are left without an unambiguous meaning.</p> 
<p>The worst way for a construct not to have an unambiguous meaning is to be <em>undefined behavior</em>. An example of undefined behavior is invalid pointer access. Not all platforms have the required MMU, and even when they do, the MMU does not catch all invalid memory uses. Accessing an invalid pointer is not guaranteed to crash the program cleanly. A program that does so may \crash cleanly"  or continue executing in an inconsistent state. The program may crash a little bit later because of the inconsistency  or it may appear to work fine and return a wrong result  or it may reformat your harddrive. All these behaviors and others are encompassed by "undefined behavior".</p> 
<h2>Compilers and compilation-platform-specific static analysis</h2> 
<p>Recently  C compilers have started to take advantage of undefinedness in the C99 standard for the sake of aggressive optimization. They apply program transformations that are <strong>only valid in the case of defined behaviors</strong>  and leave the program doing strange  unpredictable things in other cases. John Regehr has a very good explanation of C's undefined behaviors that spans <a href="http://blog.regehr.org/archives/213" hreflang="en">three blog posts</a>.</p> 
<p>There are other  more benign ways for a construct not to have an unambiguous meaning  though. <em>Implementation-defined behaviors</em> must be documented and consistent for a given compiler. In the case of <em>unspecified behaviors</em>  one of several plausible alternatives must happen. In Frama-C  we have reclassified some unspecified  and even some undefined  behaviors as implementation-defined. For instance  the value analysis assumes by default that signed arithmetic overflows have been put there on purpose by the programmer and that he intended 2's complement modulo results. In this default configuration  the value analysis does not warn about these overflows and propagate 2's complement modulo results. John Regehr convinced us  however  that this was not satisfactory  so there is now an option <code>-val-signed-overflow-alarms</code> for when the programmer is not supposed to rely on this behavior (signed arithmetic overflows are technically undefined behavior). The Jessie plug-in allows roughly the same distinction  with the defaults reversed (by default  you get alarms for integer overflows). 
In short  a Frama-C analysis session is parameterized by a <strong>description of the compilation platform</strong> that includes genuine or reclassified implementation-defined choices.</p> 
<p>That still leaves  of course  some unspecified and undefined behaviors that the value analysis or Jessie plug-ins do indeed warn about  and prove the absence of when they do not warn about them. Each construct that may cause some kind of unspecifiedness or undefinedness is the object of a warning containing the word <code>assert</code> in the value analysis log. Terms and conditions apply.</p> 
<p>This post is about the one use case where the user does not have to worry about the alarms  So if you were tired of the ongoing thread about the verification of Skein-256  good news! This is something else.</p> 
<h2>Derived analyses</h2> 
<p>Well  I say "one" use case  but it's really a family of use cases. I am talking about all the <strong>analyses that are helping the user to comprehend the code</strong>  and that can therefore not mandate that all the alarms have been eliminated through the kind of laborious inspection and refinement illustrated in the Skein verification thread. Instead  the values  and therefore the results of the analysis that relies on them  are guaranteed to apply to well-defined executions only (executions that do not falsify any of the assertions that have been emitted). As long as this is kept in mind  it may even not be necessary to look at these alarms. In fact  it is standard for high-level analyzers of C programs to <strong>assume the absence of undefined behaviors</strong> or apply only to executions without 
undefined behaviors. Fast alias analyses intended to scale to millions of lines of C  for instance  assume that you do not transmit an address to pointer <code>p</code> by writing it in <code>*(&amp;q+1)</code> where <code>q</code> is the address below <code>p</code> in memory. These analyses  and any other analyses that rely on the first ones  do not tell you whether you should have reasons to worry about this happening.  If you should worry  they do not give you a list of dangerous constructs in the target program. The value analysis provides such a list of alarms  but you can ignore it if you wish  and then you get results with similar caveats.</p> 
<p>Well  almost. We are finally arriving to the core of the problem.</p> 
<h2>What is a well-defined execution?</h2> 
<p>The value analysis is more precise and more aggressive than the typical large-scale alias analysis in its handling of what look like possible quirks in the analyzed program. The reason the <code>-val-signed-overflow-alarms</code> option mentioned above is not enabled by default is that several test programs became meaningless. The analysis was in effect saying "look  this program is sure to produce an overflow here at the third line  there is no point in going further until this has been resolved"  and refused to analyze the rest of the program. The typical large-scale context-insensitive path-insensitive alias analysis does not have this problem  but that's only because it is too imprecise to have it. It is still making assumptions  such that you do not access <code>p</code> when you read or write from <code>*(&amp;q+1)</code>.</p> 
<p>So does this mean that all high-level analyses of C programs are meaningless  unless the absence of undefined behaviors in target program is also verified(1)? In the style of Dan Simmons' Hyperion cycle  we are going to leave this crucial question for another volume  that is  a future blog post.</p> 
<p>(1) which it rarely is in practice. As pointed out earlier  most of them do not even tell you whether you should worry and what to worry about.</p>
 <h2>Prologue</h2> 
<p><strong>The C standard(s) specifies a minimum of things</strong> that all C compilers must agree on. For the sake of efficiency, many syntactically correct constructs are left without an unambiguous meaning.</p> 
<p>The worst way for a construct not to have an unambiguous meaning is to be <em>undefined behavior</em>. An example of undefined behavior is invalid pointer access. Not all platforms have the required MMU, and even when they do, the MMU does not catch all invalid memory uses. Accessing an invalid pointer is not guaranteed to crash the program cleanly. A program that does so may \crash cleanly"  or continue executing in an inconsistent state. The program may crash a little bit later because of the inconsistency  or it may appear to work fine and return a wrong result  or it may reformat your harddrive. All these behaviors and others are encompassed by "undefined behavior".</p> 
<h2>Compilers and compilation-platform-specific static analysis</h2> 
<p>Recently  C compilers have started to take advantage of undefinedness in the C99 standard for the sake of aggressive optimization. They apply program transformations that are <strong>only valid in the case of defined behaviors</strong>  and leave the program doing strange  unpredictable things in other cases. John Regehr has a very good explanation of C's undefined behaviors that spans <a href="http://blog.regehr.org/archives/213" hreflang="en">three blog posts</a>.</p> 
<p>There are other  more benign ways for a construct not to have an unambiguous meaning  though. <em>Implementation-defined behaviors</em> must be documented and consistent for a given compiler. In the case of <em>unspecified behaviors</em>  one of several plausible alternatives must happen. In Frama-C  we have reclassified some unspecified  and even some undefined  behaviors as implementation-defined. For instance  the value analysis assumes by default that signed arithmetic overflows have been put there on purpose by the programmer and that he intended 2's complement modulo results. In this default configuration  the value analysis does not warn about these overflows and propagate 2's complement modulo results. John Regehr convinced us  however  that this was not satisfactory  so there is now an option <code>-val-signed-overflow-alarms</code> for when the programmer is not supposed to rely on this behavior (signed arithmetic overflows are technically undefined behavior). The Jessie plug-in allows roughly the same distinction  with the defaults reversed (by default  you get alarms for integer overflows). 
In short  a Frama-C analysis session is parameterized by a <strong>description of the compilation platform</strong> that includes genuine or reclassified implementation-defined choices.</p> 
<p>That still leaves  of course  some unspecified and undefined behaviors that the value analysis or Jessie plug-ins do indeed warn about  and prove the absence of when they do not warn about them. Each construct that may cause some kind of unspecifiedness or undefinedness is the object of a warning containing the word <code>assert</code> in the value analysis log. Terms and conditions apply.</p> 
<p>This post is about the one use case where the user does not have to worry about the alarms  So if you were tired of the ongoing thread about the verification of Skein-256  good news! This is something else.</p> 
<h2>Derived analyses</h2> 
<p>Well  I say "one" use case  but it's really a family of use cases. I am talking about all the <strong>analyses that are helping the user to comprehend the code</strong>  and that can therefore not mandate that all the alarms have been eliminated through the kind of laborious inspection and refinement illustrated in the Skein verification thread. Instead  the values  and therefore the results of the analysis that relies on them  are guaranteed to apply to well-defined executions only (executions that do not falsify any of the assertions that have been emitted). As long as this is kept in mind  it may even not be necessary to look at these alarms. In fact  it is standard for high-level analyzers of C programs to <strong>assume the absence of undefined behaviors</strong> or apply only to executions without 
undefined behaviors. Fast alias analyses intended to scale to millions of lines of C  for instance  assume that you do not transmit an address to pointer <code>p</code> by writing it in <code>*(&amp;q+1)</code> where <code>q</code> is the address below <code>p</code> in memory. These analyses  and any other analyses that rely on the first ones  do not tell you whether you should have reasons to worry about this happening.  If you should worry  they do not give you a list of dangerous constructs in the target program. The value analysis provides such a list of alarms  but you can ignore it if you wish  and then you get results with similar caveats.</p> 
<p>Well  almost. We are finally arriving to the core of the problem.</p> 
<h2>What is a well-defined execution?</h2> 
<p>The value analysis is more precise and more aggressive than the typical large-scale alias analysis in its handling of what look like possible quirks in the analyzed program. The reason the <code>-val-signed-overflow-alarms</code> option mentioned above is not enabled by default is that several test programs became meaningless. The analysis was in effect saying "look  this program is sure to produce an overflow here at the third line  there is no point in going further until this has been resolved"  and refused to analyze the rest of the program. The typical large-scale context-insensitive path-insensitive alias analysis does not have this problem  but that's only because it is too imprecise to have it. It is still making assumptions  such that you do not access <code>p</code> when you read or write from <code>*(&amp;q+1)</code>.</p> 
<p>So does this mean that all high-level analyses of C programs are meaningless  unless the absence of undefined behaviors in target program is also verified(1)? In the style of Dan Simmons' Hyperion cycle  we are going to leave this crucial question for another volume  that is  a future blog post.</p> 
<p>(1) which it rarely is in practice. As pointed out earlier  most of them do not even tell you whether you should worry and what to worry about.</p>
{% endraw %}