Newer
Older
\usepackage{microtype}
\usepackage{lmodern}
\usepackage{booktabs}
\usepackage{calc}
\usepackage{multirow}
% Commandes pour mettre des ref dans l'index :
\newcommand{\idb}[1]{\textbf{#1}}
\newcommand{\indextxt}[1]{\index{#1}{\bf #1}}
\newcommand{\indexdef}[1]{\index{#1|idb}{{\bf #1}}}
\newcommand{\indextxtdef}[2]{\index{#2|idb}{{\bf #1}}}
\renewcommand{\labelitemii}{$\triangleright$}
\newcommand{\bropen}{\mbox{\tt [}}
\newcommand{\brclose}{\mbox{\tt ]}}
\newcommand{\cbopen}{\mbox{\tt \{}}
\newcommand{\cbclose}{\mbox{\tt \}}}
\newcommand{\framacversion}%
{\input{../../VERSION} (\input{../../VERSION_CODENAME}\unskip)}
\lstset{literate=%
{∈}{{$\in$ }}1
}
\title{The \Eva{} plug-in}
\author{David Bühler, Pascal Cuoq and Boris Yakobowski. \\
With Matthieu Lemerre, André Maroneze,
Valentin Perrelle and Virgile Prevosto}
\fcversion{\framacversion}
\cclicence{by-sa}
\copyrightstarts{2011}
%==============================================================================
\begin{document}
\sloppy
\emergencystretch 3em
\chapter*{Foreword}
\markright{}
\addcontentsline{toc}{chapter}{Foreword}
This is the documentation of the \FramaC plug-in \Eva.
The content of this document corresponds to the version \framacversion, released
on \today, of \FramaC. However the development of \FramaC is still ongoing:
features described here may still evolve in the future.
\section*{Acknowledgements}
%\addcontentsline{toc}{chapter}{Acknowledgements}
\medskip
\acknowledgeANR{
the French ANR projects U3CAT~(08-SEGI-021-01)
}
\chapter{Introduction}\label{introduction}
\vspace{2cm}
{\em Frama-C is a modular static analysis framework for the C language.\\
This manual documents the \Eva{} (for \emph{Evolved Value Analysis}) plug-in of Frama-C.}
\vspace{2cm}
The \Eva{} plug-in automatically computes sets of possible
values for the variables of an analyzed program. \Eva{}
warns about possible run-time errors in the analyzed program.
Lastly, synthetic information
about each analyzed function can be computed automatically from the
values provided by \Eva{}: these functionalities (input
variables, output variables, and information flow) are also documented
here. \bigskip
The framework, the \Eva{} plug-in and the other plug-ins
documented here are all Open Source software. They can be
downloaded from \url{http://frama-c.com/}.
\bigskip
In technical terms, \Eva{} is context-sensitive and
path-sensitive. In non-technical terms, this means that it tries to
offer precise results for a large set of C programs. Heterogeneous
pointer casts, function pointers, and floating-point computations are
handled.
\section{First contact}
Frama-C comes with two interfaces: batch and interactive.
The interactive graphical interface of Frama-C displays
a normalized version of the analyzed source code.
In this interface, the \Eva{} plug-in allows the user
to select an expression in the
code and observe an over-approximation of the set of
values this expression can take at run-time.
Here is a simple C example:
\listinginputcaption{1}{\texttt{introduction.c}}{examples/introduction.c}
If either interface of Frama-C is launched with options
\lstinline|-eva introduction.c|,
the \Eva{} plug-in is able to guarantee that at each passage through
the \lstinline|return| statement of
function \lstinline|f|, the global variables \lstinline+y+ and
\lstinline+z+ each contain either 1 or 3.
At the end of function \lstinline|main|,
it indicates that \lstinline|y|
necessarily contains 4, and the value of \lstinline|z| is again 1 or 3.
When the plug-in indicates the value of \lstinline|y| is 1 or 3
at the end of function \lstinline|f|, it implicitly computes the union
of all the values in \lstinline|y|
at each passage through this program point throughout any execution.
In an actual execution of this deterministic program,
there is only one passage though the end
of function \lstinline|main|, and therefore only one value for \lstinline|z|
at this point.
The answer given by \Eva{} is approximated but correct (the actual
value, 3, is among the proposed values).
The theoretical framework on which \Eva{} is founded is called
Abstract Interpretation and has been the subject of extensive research
during the last forty years.
\section{Run-time errors and the absence thereof}
An analyzed application may contain run-time errors: divisions by zero,
invalid pointer accesses,\ldots
\listinginputcaption{1}{\texttt{rte.c}}{examples/rte.c}
When launched with \lstinline|frama-c -eva rte.c|,
\Eva{} emits a warning about an out-of-bound access at line 5:
\begin{logs}
[eva:alarm] rte.c:5: Warning: accessing out of bounds index. assert i < 10;
\end{logs}
There is in fact an out-of-bounds access at this line in the program.
It can also happen that, because of approximations during its computations,
Frama-C emits warnings for constructs
that do not cause any run-time errors. These are called ``false alarms''.
On the other hand, the fact that \Eva{}
computes correct, over-approximated sets of possible values prevents it
from remaining silent on a program that contains a run-time error. For
instance, a particular division in the analyzed program
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
as soon as the set of possible values for the divisor
contains zero. Only
if the set of possible values computed by \Eva{}
does not contain zero is the warning omitted, and
that means that the divisor really cannot be null at run-time.
\section{From Value Analysis to \Eva{}}
The \Eva{} plug-in was previously called the \emph{Value Analysis plug-in}.
Following major changes in its expressivity and overall precision,
the plugin was subsequently renamed to \emph{Evolved Value Analysis}, or
\Eva{} for short. Those changes were first made available with the
Aluminium version of Frama-C. They are presented in section~\ref{sec:eva}.
\section{Other analyses based on \Eva{}}
Frama-C also provides
synthetic information on the behavior of analyzed functions:
inputs, outputs, and dependencies. This information is computed
from the results of the \Eva{} plug-in,
and therefore some familiarity with \Eva{}
is necessary to get the most of these computations.
\chapter{Tutorial}\label{tutorial}
\vspace{2cm}
{\em Some use cases for \Eva{}\ldots}
\vspace{2cm}
\section{Introduction}
Throughout this tutorial, we will see on a single example how to use
\Eva{} for the following tasks :
\begin{enumerate}
\item to get familiar with foreign code,
\item to produce documentation automatically,
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
\item to guarantee the absence of bugs.
\end{enumerate}
It is useful to stick to a single example in this tutorial, and
there is a natural progression to the list of results above,
but in real life, a single person would generally focus on
only one or two of these four tasks for a given
codebase. For instance, if you need Frama-C's help to reverse
engineer the code as in tasks 1 and 2, you have
probably not been provided with the quantity of documentation
and specifications
that is appropriate for meaningfully carrying out task 4.
Frama-C helps you to achieve tasks 1-3 in less time than you would
need to get the same results using the traditional approach of writing
tests cases. Task 4, formally guaranteeing the absence of bugs, can in
the strictest sense not be achieved at all using tests for two
reasons. Firstly, many forms of bugs that occur in a C program
(including buffer overflows) may cause the behavior of the program to
be non-deterministic. As a consequence, even when a test suite comes
across a bug, the faulty program may appear to work correctly during
tests and fail later, after it has been deployed. Secondly, the
notion of coverage of a test suite in itself is an invention made
necessary because tests aren't usually exhaustive. Assume a
function's only inputs are two 32-bit integers, each allowed to take
the full range of their possible values. Further assume that this
function only takes a billionth of a second to run (a couple of cycles
on a 2GHz processor). A pencil and the back of an envelope show that
this function would take 600 years to test exhaustively. Testing
coverage criteria can help decide a test suite is ``good enough'', but
there is an implicit assumption in this reasoning. The assumption is
that a test suite that satisfies the coverage criteria finds all the
bugs that need to be found, and this assumption is justified
empirically only.
\section{Target code: Skein-256}
A single piece of code, the reference implementation for the Skein
hash function, is used as an example throughout this document. As of
this writing, this implementation is available at
\url{http://www.schneier.com/code/skein.zip}. The Skein function is
Niels Ferguson, Stefan Lucks, Bruce Schneier, Doug Whiting, Mihir
Bellare, Tadayoshi Kohno, Jon Callas, and Jesse Walker's entry in the
NIST cryptographic hash algorithm competition for SHA-3.
Cryptographic hash functions are one of the basic building blocks from
which cryptographic protocols are designed. Many cryptographic
protocols are designed with the assumption that an implementation for
a function $h$ from strings to fixed-width integers is available with
the following two properties:
\begin{itemize}
\item it is difficult to find two distinct, arbitrary strings $s_1$ and $s_2$
such that $h(s_1)=h(s_2)$,
\item for a given integer $i$, it is difficult to build a string $s$
such that $h(s)=i$.
\end{itemize}
A function with the above two properties is called a cryptographic hash
function. It is of the utmost importance that the function actually
chosen in the implementation of a cryptographic application satisfies the
above properties! Any weakness in the hash function can be
exploited to corrupt the application.
Proofs of security for
cryptographic hash functions are complicated mathematical affairs.
These proofs are made using a mathematical definition
of the function. Using static analysis for verifying that a piece
of code corresponds to the mathematical model
which was the object of the security proofs is an interesting
topic but is outside the scope of this tutorial.
In this document, we will not be using \Eva{} for
verifying cryptographic properties.
We will, however, use \Eva{} to familiarize ourselves
with Skein's reference implementation, and we will see that
it can be a more useful tool than, say, a C
compiler, to this effect. We will also use \Eva{} to look
for bugs in the implementation of Skein, and finally prove that the
functions that implement Skein may never cause a run-time error for a
general use pattern. Because of the numerous pitfalls of the C
programming language, any amount of work at the
mathematical level cannot exclude the possibility of problems
such as buffer overflows in the implementation.
It is a good thing to be able to rule these out with \Eva{}.
\Eva{} is most useful for embedded or embedded-like
code. Although the Skein library does not exactly fall in this category,
it does not demand dynamic allocation and uses few functions from external
libraries, so it is well suited to this analysis.
\section{Using \Eva{} for getting familiar with Skein}
\subsection{Writing a test}
After extracting the archive \lstinline|skein_NIST_CD_010509.zip|,
listing the files in\\
\lstinline|NIST/CD/Reference_Implementation| shows:
\begin{listing-nonumber}
SHA3api_ref.c
SHA3api_ref.h
brg_endian.h
brg_types.h
skein.c
skein.h
skein_block.c
skein_debug.c
skein_debug.h
skein_port.h
\end{listing-nonumber}
The most natural place to go next is the file \lstinline|skein.h|,
since its name hints that this is the header with the
function declarations that should be called from outside the library.
Scanning the file quickly, we may notice declarations such as
\begin{listing-nonumber}
typedef struct /* 256-bit Skein hash context structure */
{
[...]
} Skein_256_Ctxt_t;
/* Skein APIs for (incremental) "straight hashing" */
int Skein_256_Init (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
[...]
int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg,
size_t msgByteCnt);
[...]
int Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
\end{listing-nonumber}
The impression we get at first glance is that the hash
of an 80-char message containing
the string ``People of Earth, your attention, please''
can be computed as simply as declaring a variable of
type \lstinline|Skein_256_Ctxt_t|, letting \lstinline|Skein_256_Init|
initialize it, passing \lstinline|Skein_256_Update| a representation
of the string, and calling \lstinline|Skein_256_Final| with the
address of a buffer where to write the hash value.
Let us write a C program that does just that:
\listinginputcaption{1}{\texttt{main\_1.c}}{tutorial/main_1.c}
In order to make the test useful, we have to print
the obtained hash value. Because the result we are interested in is
an 8-byte number, represented as a char array of
arbitrary characters (some of them non-printable), we cannot
use string-printing functions, hence the \lstinline|for| loop at lines 17-18.
\goodbreak
With luck, the compilation goes smoothly and we obtain
a hash value:
\begin{frama-c-commands}
gcc *.c
./a.out
\end{frama-c-commands}
\begin{logs}
215
215
189
207
196
124
124
13
\end{logs}
\begin{important}
Your actual result may differ from the one above;
this will be explained later.
\end{important}
\subsection{First try at an analysis using \Eva{}}
Let us now see how \Eva{} works on the same example.
\Eva{} can be launched with the following command.
The analysis should not take more than a few seconds:
\begin{frama-c-commands}
frama-c -eva main_1.c > log
\end{frama-c-commands}
Frama-C has its own model of the target platform (the default target is
a little-endian 64-bit platform). It also uses the host system's preprocessor.
If you want to do the analysis for a different platform than the host platform,
you need to provide Frama-C with a way to pre-process the files as they
would be during an actual compilation.
There are analyses where the influence of host platform parameters
is not noticeable. The analysis we are embarking on is not one of them.
If you pre-process the Skein source code with a 32-bit compiler and
then analyze it with Frama-C targeting its default 64-bit platform,
the analysis will be meaningless and you won't be able to make sense of
this tutorial.
If you are using a 32-bit compiler, simply match
Frama-C's target platform with your host header files by systematically adding
the option \verb|-machdep x86_32| to all commands in
this tutorial.
The ``\lstinline|> log|'' part of the command sends all
the messages emitted by Frama-C into a file named
``\lstinline|log|''. \Eva{} is verbose for a number of reasons that
will become clearer later in this tutorial.
The best way to make sense of the information
produced by the analysis is
to send it to a log file.
There is also a Graphical User Interface
for creating analysis projects and
Note that you cannot {\em edit} the source code in the GUI, only visualize it.
Most of the information present in the console is also available in the GUI,
but presented differently. Each interface (terminal or GUI) is better suited
for a specific set of tasks.
One way to create an analysis project in the GUI
is to pass the command \lstinline|frama-c-gui| the same options
that would be passed to \lstinline|frama-c|. In this first example,
the command \lstinline|frama-c-gui -eva main_1.c| launches
the same analysis and then opens the GUI for inspection of the results.
\begin{important}
For simple projects like this tutorial, running everything at once
is perfectly suitable; however, for larger analyses, we recommend the
three-step approach presented in section~\ref{three-step}.
\end{important}
The initial syntactic analysis and symbolic link phases
of Frama-C may find issues such as
inconsistent types for the same symbol in different files.
Because of the way separate compilation is implemented,
the issues are not detected by standard C compilers.
It is a good idea to check for these issues in the first
lines of the log.
\subsection{Missing functions}
Since we are trying out the library for the first time,
something else to look for in the log file is the list of functions for
which the source code is missing. \Eva{} requires
either a body or a specification for each function it analyzes.
\begin{frama-c-commands}
grep "Neither code nor specification" log
\end{frama-c-commands}
This should match lines indicating functions that are both undefined
(without source) and that have no specification in Frama-C's standard library.
In our example, we obtain the following lines in the log:
\begin{logs}
[kernel:annot:missing-spec] main_1.c:13: Warning:
Neither code nor specification for function Skein_256_Init,
generating default assigns from the prototype
[eva] using specification for function Skein_256_Init
\end{logs}
The same messages are produced for functions \verb|Skein_256_Update| and
\verb|Skein_256_Final|.
For each function, there are two messages: first, a warning about the absence
of either code or ACSL specification for the function, stating
that an {\em assigns} clause will be generated for the function
(the {\em assigns} clause will be explained in detail later, in
section~\ref{annot_assigns}). Second, a message stating that \Eva{} will use a
specification for the missing function.
What happens is that, without code or specification, Frama-C can only
{\em guess} what each function does from the function's prototype,
which is both inaccurate and likely incorrect. \Eva{} needs a specification
for each function without body reached during the analysis, so one is created
on the fly and then used, but results are likely to be unsatisfactory.
\begin{important}
Absence of code or ACSL specification is a major issue which often renders
the analysis incorrect. For this reason, we recommend converting this warning
into an error, in order to spot it immediately.
The {\em analysis scripts template}, mentioned in the Frama-C User
Manual~\cite{FCUserMan}, includes it by default.
\end{important}
To fix the issue, we only need to give Frama-C all of the C sources in
the directory:
\begin{frama-c-commands}
frama-c -eva *.c > log
\end{frama-c-commands}
No further warnings about missing functions are emitted. We can now focus on
functions without bodies:
\begin{shell}
grep "using specification for function" log
[eva] using specification for function printf_va_1
\end{shell}
This \verb|printf_va_1| function is not present directly in the source code;
it is a specialization of a call to the variadic function \verb|printf|. This
specialization is performed by the \textsf{Variadic} plugin\footnote{The
\textsf{Variadic} plugin is described in more detail in the Frama-C User
Manual~\cite{FCUserMan}.}.
All we need to know for now is that the plugin handles calls to variadic functions
and produces sound specifications for them, which are then used by \Eva{}.
This call to \verb|printf| has no observable effects for the analysis anyway,
so we do not have anything to be concerned with. It is still a good idea to
check the specification of each function without body used during the analysis,
since the overall correctness depends on them. This can be done using the
GUI or, on the command line, with option \verb|-print|, which outputs the
Frama-C normalized source code, including ACSL specifications and
transformations performed by the \textsf{Variadic} plugin.
It is also possible to obtain a list of
missing function definitions by using the command:
\begin{frama-c-commands}
frama-c -metrics *.c
\end{frama-c-commands}
This command computes, among other pieces of information,
a list of missing functions
(under the listing ``Undefined and unspecified functions'')
using a syntactic analysis produced by the \textsf{Metrics} plugin.
It is not exactly equivalent to
grepping the log of \Eva{} because it lists
all the functions that are missing, while the log of
\Eva{} only cites the functions that would have
been necessary to an analysis. When
analyzing a small part of a large codebase, the latter list
may be much shorter than the former, in which case
relying on the information displayed by \lstinline|-metrics| would
mean spending time hunting for functions that are not actually
necessary.
By default, \textsf{Metrics} does not list functions from the standard C
library. This can be done by adding option \verb|-metrics-libc|, which would
produce something similar to:
\begin{logs}
Specified-only functions (94)
=======================
[...]
memcpy (21 calls); (...) memset (27 calls); (...) printf_va_1 (1 call);
\end{logs}
The fact that these functions are marked as {\em Specified-only}
means that \Eva{} will use their ACSL specifications (which are already present
in Frama-C's libc) instead of their code (which is absent). When running \Eva{},
some of these functions result in an output similar to the following:
\begin{logs}
[eva] FRAMAC_SHARE/libc/string.h:118:
cannot evaluate ACSL term, unsupported ACSL construct: logic function memset
\end{logs}
Note that this message is {\em not} a warning (it does not begin with the word
``Warning'') neither an alarm (it is not prefixed with \verb|[eva:alarm]|). It
is an informative message related to Frama-C's libc specifications that can be
safely ignored\footnote{For the interested reader:
some of the more thorough ACSL specifications in Frama-C's
\texttt{<string.h>} are useful for plugins such as \textsf{WP},
but are not yet fully interpreted by \Eva{}. Besides, \Eva{} has builtins which
improve precision and efficiency of the analysis of such functions, beyond what
ACSL specifications can provide.} and will likely disappear in future Frama-C
releases.
\subsection{Interpreting the output of the analysis}
By default, \Eva{}'s log emits several messages during the analysis, which are
essential for a deep understanding of the issues which may happen, but on
overview, these are the main components of the log:
\begin{itemize}
\item parsing and related messages (in our case, simply the list of files being
parsed);
\item start of the analysis by \Eva{}, with the initial values of global
variables;
\item messages emitted during the analysis, such as warnings, alarms and
informative feedback;
\item values for all variables at the end of each function;
\item an analysis summary, with coverage and the number of
errors, warnings, alarms and logical properties.
\end{itemize}
If we focus on the initial values of variables, we notice that,
apart from the variables defined in the source code, there
is a rather strange variable named \lstinline|Skein_Swap64_ONE| and containing
\lstinline|1|. Searching the source code reveals that this variable is
in fact variable \verb|ONE|, declared \lstinline|static| inside function
\lstinline|Skein_Swap64|. Frama-C can
display the value of static variables, something that is annoying to
do when using a C compiler for testing.
The GUI displays the original source code alongside the transformed one,
and allows navigating between uses of a variable and its declaration.
A quick inspection shows that the Skein functions
use variable \lstinline|ONE| to detect endianness.
Frama-C assumes a little-endian architecture
by default, so \Eva{} is only analyzing the
little-endian version of the library (Frama-C
also assumes an IA-64 architecture, so we are only
analyzing the library as compiled and run on this architecture).
The big-endian version of the library could be analyzed
by reproducing the same steps we are taking here for
a big-endian configuration of Frama-C.
You may also see variable names
prefixed by \verb|__fc_|, \verb|__FC_| and \verb|S___fc_|.
These are variables coming from ACSL specifications in the Frama-C
standard library.
After the initial values, the analysis log contains some lines such as:
\begin{logs}
[eva] skein_block.c:56: starting to merge loop iterations
\end{logs}
This means that \Eva{} has encountered a loop and is performing an
approximation. These messages can be ignored for now.
Arguably some of the most important messages in the analysis are {\em alarms},
such as the one below:
\begin{logs}
[eva:alarm] skein_block.c:69: Warning:
accessing uninitialized left-value. assert \initialized(&ks[i]);
\end{logs}
As we will find out later in this tutorial, this alarm is a false
alarm, an indication of a potential problem in a place where there is
in fact none. However, \Eva{} never remains
silent when a risk of run-time error is present, unlike many other
static analysis tools. The important consequence is that if you get
the tool to remain silent, you have {\em formally verified} that
no run-time error could happen when running the program.
The alarm means that \Eva{} could not prove that the value \verb|ks[i]|
accessed in \verb|skein_block.c:69| was definitely initialized before being
read. Reading an uninitialized value is an undefined behavior according to the
\isoc{} standard (and can even lead to security vulnerabilities).
As is often the case, this alarm is related to the approximation introduced
in \verb|skein_block.c:56|, which is the loop responsible for initializing
the values of array \verb|ks[i]|. The order of the messages during the
analysis thus provides some hints about what is happening, and also some
ideas about how to solve them.
\subsection{Increasing analysis precision in loops}
The previous alarm is just one of several related to variable initialization.
Before we spend any of our time looking at each of these alarms,
trying to determine whether they are true or false,
it is a good idea to make the analyzer spend more of {\em its} time
trying to determine the same thing.
There are different settings that influence the trade-off
between precision and resource consumption.
David Bühler
committed
When dealing with bounded loops, option \verb+-eva-auto-loop-unroll N+
automatically unrolls simple loops that have less than \verb+N+ iterations.
This option is based on syntactic heuristics to improve the precision
at a low cost, and will not handle complex loops.
It should be enabled in most analyses.
If automatic loop unrolling is insufficient, the next best approach consists in
using an ACSL annotation, \verb+//@ loop unroll N+, to direct \Eva{} to analyze
precisely the \verb+N+ first iterations of the loop before approximating the
results.
Note: both approaches can be combined to enable automatic loop unrolling
{\em except} in a few chosen loops: since the \verb+@loop unroll+ annotation
takes precedence over other options, adding \verb+@loop unroll 0;+ to a loop
will prevent it from being unrolled, even when \verb+-eva-auto-loop-unroll+
is active.
Frama-C's graphical interface can help estimating loop bounds by inspecting the
values taken by the loop counter, as in the example below:
\begin{figure}[hbt]
\centering
\includegraphics[width=0.75\textwidth]{gui-images/gui-loop-to-unroll.png}
\caption{Estimating loop unroll bounds in the GUI}
\label{fig:gui-loop-to-unroll}
\end{figure}
In this case, \verb|WCNT| has a constant value (4), so it could also be used
to estimate the loop bounds.
We can then Ctrl+click on the loop condition in the original source view
(right panel) to open an editor\footnote{The external code editor used by the GUI
can be defined via the menu {\em File - Preferences}.} centered on the loop, and
then add the loop unroll annotation, as follows:
\begin{listing-nonumber}
//@ loop unroll 4;
for (i=0;i < WCNT; i++)
{
ks[i] = ctx->X[i];
ks[WCNT] ^= ctx->X[i]; /* compute overall parity */
}
\end{listing-nonumber}
The value 4 is sufficient to completely unroll the loop, even though \verb|i|
ranges from 0 to 4 (5 values in total). One way to confirm the unrolling is
complete is to check that ``starting to merge loop iterations'' is no
longer emitted when entering the loop. Also, when a loop unroll annotation
is present but insufficient to unroll the loop, a message is emitted:
\begin{logs}
[eva:loop-unroll] skein_block.c:57: loop not completely unrolled
\end{logs}
Once a loop is completely unrolled, any ``leftovers'' are ignored, so it incurs
no extra cost. Using a value larger than necessary is therefore not an issue.
The practical effect of adding this annotation and then re-running \Eva{} is
the elimination of 3 alarms in the analysis. Thanks to the extra precision
allowed by the annotation (with a minimal increase in analysis time), \Eva{}
is now able to prove there are no initialization errors in more places than
before.
\subsection{Increasing precision}
Option \verb|-eva-precision| allows setting a global trade-off between
precision and analysis time. By default, \Eva{} is tuned to a low precision
to ensure a fast initial analysis. Especially for smaller code bases, it is
often useful to increase this precision, to dismiss ``easy'' alarms.
Precision can be set between 0 and 11, the higher the value, the more precise
the analysis. The default value of \Eva{} is somewhere between 0 and 1, so that
setting \verb|-eva-precision 0| potentially allows it to go even faster
(only useful for large or very complex code bases).
\verb|-eva-precision| is in fact a meta-option, whose only purpose is to set
the values of other options to a set of predefined values. This avoids the user
having to know all of them, and which values are reasonable. Think of it as a
``knob'' to perform a coarse adjustment of the analysis, before fine-tuning
it more precisely with other options.
\verb|-eva-precision| displays a message when it is used, stating which options
are affected and the value given to them. If one of those options is already
specified by the user, that value takes priority over the one chosen by
\verb|-eva-precision|.
Among the options set by \verb|-eva-precision|, there is \verb|-eva-slevel|,
which can be thought of as some kind of ``fuel''
to be consumed during the analysis: as long as there is some slevel,
the analysis will keep states separated and maintain precision at the cost of
extra analysis time; when all of
it is consumed, further states are merged, avoiding increase in analysis time
but approximating the results. It can also be
specified separately for each function (\verb|-eva-slevel-function f:n|).
For complex code bases, however, \verb|-eva-slevel| lacks in stability and
predictability: it is very hard to determine exactly {\em how} it is used.
It can be consumed in the presence of loops, branches, disjunctions;
in nested loops, it is consumed by both inner and outer loops.
In loops with branches, its consumption may become exponential.
And once a satisfactory value is found, later changes to the source code,
ACSL specifications, \Eva{}'s algorithms or other parameters can affect it,
requiring a new parametrization.
In our example, we can quickly try a few values of \verb|-eva-precision|,
such as 1, 2 and 3:
\begin{frama-c-commands}
frama-c -eva-precision 3 -eva *.c > log
\end{frama-c-commands}
Now, the analysis goes rather far without finding any alarm,
but when it is almost done (after the analysis of function
\lstinline|Skein_256_Final|), it produces:
\begin{logs}
...
[eva] using specification for function printf_va_1
[eva:alarm] main_1.c:17: Warning:
accessing uninitialized left-value. assert \initialized(&hash[i]);
[eva] Done for function main
...
\end{logs}
There remains an alarm about initialization, but all the others have been
removed. Trying larger values for precision or slevel does not change this.
In this case, we can afford to spend some time looking at it in more detail.
\subsection{Inspecting alarms in the GUI}
Instead of running \verb|frama-c|, let us use \verb|frama-c-gui|:
\begin{frama-c-commands}
frama-c-gui -eva-precision 3 -eva *.c > log
\end{frama-c-commands}
The GUI allows the user to navigate the source code, to inspect the sets of
values inferred by the analysis and to get
a feeling of how it works. Right-clicking on the function name
at a call site brings a contextual menu to jump
to the function's definition.
In order to return to the caller,
right-clicking on the name of the current function at the top
brings a contextual menu with a list of callers.
You can also use the Back button (left arrow) in the toolbar.
The Information panel also offers navigation possibilities between
variables and their definitions.
In the graphical interface, three panels (displayed below the
source code) are very useful for \Eva{}: Properties, Values, and
Red Alarms. Further details about the GUI are presented in chapter~\ref{gui}.
For now, it suffices to say that:
\begin{itemize}
\item Properties displays (among others) the list of alarms in the program;
\item Values displays the inferred values for the expression selected in the
Frama-C normalized source code (highlighted in green);
\item Red Alarms displays some special cases of alarms that should be considered
first.
\end{itemize}
In the GUI, each alarm is represented with a ``bullet'' to its left.
Red bullets mean ``this {\em always} happens''; yellow
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
bullets mean ``this {\em may} happen (but I am not sure, due to
approximations)''. When there are several alarms, some strategies allow
prioritizing which alarms are more likely to correspond to actual issues.
The overall rule of thumb when inspecting alarms in the GUI\footnote{The first
time the Frama-C GUI is run, the Properties panel displays several kinds of
properties. For \Eva{}, the most useful view consists in selecting the
``Reset 'Status' filters to show only unproved/invalid'' option, using the
button next to ``Refresh'' in the Properties panel.} is the following:
\begin{itemize}
\item inspect red alarms in the Properties panel; these correspond to
situations where the analysis is {\em certain} that a problem has arrived;
either an actual bug in the code, or some issue in the analysis that needs to
be addressed (e.g. due to missing or incorrect specifications);
\item inspect the cases present in the Red Alarms panel; these are situations
which are likely to correspond to definitive problems, and thus should be
treated before others\footnote{The {\em exact} meaning of such red alarms is
too complex to be presented here; as a first approximation, these can be
seen as intermediate between red and yellow alarms.};
\item inspect remaining cases (yellow alarms) in the Properties panel.
\end{itemize}
In the case of our remaining alarm, it is yellow, but it appears in the
Red Alarms panel. This indicates it is more likely to correspond to a real
issue.
Extra information can be observed by clicking the alarm in the Red Alarms tab,
which highlights it in the source code, then selecting expression
\lstinline|hash| in line 17 and reading the \textsf{Values} tab:
\begin{logs}
[0] $\in$ {200}
[1..7] $\in$ UNINITIALIZED
\end{logs}
This means that element \lstinline|hash[0]| has value 200, but elements 1 to 7
are definitely uninitialized. Therefore the first iteration of the loop
proceeds as expected, but on the second iteration, the analysis of the branch
is stopped, because
from the point of view of the analyzer, reading an uninitialized
value should be a fatal error. And there are no other execution
branches that reach the end of this loop, which is why the analyzer
finds that function \lstinline|main| does not terminate (the log contains
the message ``NON TERMINATING FUNCTION'').
\subsection{Finding and fixing bugs}
We found a real issue in the code, but to find the exact cause it is necessary
to investigate the code.
Looking again at the header file \lstinline|skein.h|, we may now
notice that in function \lstinline|Skein_256_Init|,
the formal parameter for the desired hash length is named
\lstinline|hashBitLen|. This parameter should certainly be expressed in bits!
We were inadvertently asking for a 1-char hash of the message
since the beginning, and the test that we ran as our first
step failed to notice it.
The bug can be fixed by passing
8*HASHLEN instead of HASHLEN as the second argument of
\lstinline|Skein_256_Init|.
With this fix in place, the analysis
with \lstinline|-eva-precision 3| produces no alarms and gives the following
result:
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
\begin{logs}
Values for function main:
hash[0] $\in$ {40}
[1] $\in$ {234}
[2] $\in$ {138}
[3] $\in$ {230}
[4] $\in$ {134}
[5] $\in$ {120}
[6] $\in$ {37}
[7] $\in$ {35}
i $\in$ {8}
\end{logs}
Meanwhile, compiling and executing the fixed test produces the result:
\begin{logs}
40
234
138
230
134
120
37
35
\end{logs}
\section{Guaranteeing the absence of bugs}
\subsection{Generalizing the analysis to arbitrary messages of fixed length}
The analysis we have done so far is very satisfying because
it finds problems that are not detected by a C compiler or by testing.
The results of this analysis only prove
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
the absence of run-time errors\footnote{and the absence
of conditions that
{\em should} be run-time errors -- like the uninitialized access already
encountered} when the particular message that we
chose is being hashed, though.
It would be much more useful to have the assurance that there are
no run-time errors for any input message, especially since the library
might be under consideration for embedding in a device where anyone
(possibly a malicious user) will be able to choose the message
to hash.
A first generalization of the previous analysis is to include
in the subject matter
the hashing of all possible 80-character messages. We can do this
by separating the analyzed program in two distinct phases, the first
one being the construction of a generalized analysis context
and the second one being made of the sequence of function calls
that we wish to study:
\listinginputcaption{1}{\texttt{main\_2.c}}{tutorial/main_2.c}
From this point onward the program is no longer executable because
of the call to builtin primitives such as \lstinline|Frama_C_interval|.
We therefore dispense with the final calls to \lstinline|printf|,
since \Eva{} offers simpler ways to observe intermediate
and final results. Note that we included \verb|__fc_builtin.h|, a file
that comes from the Frama-C distribution and which
defines \lstinline|Frama_C_interval|. Running Frama-C on this
file (without \verb|main_1.c| in the same directory, to avoid having two
definitions for \verb|main|):
\begin{frama-c-commands}
frama-c -eva-precision 3 -eva *.c > log 2>&1
\end{frama-c-commands}
This time, the absence of actual alarms is starting to be really interesting:
it means that it is formally excluded that the functions
\lstinline|Skein_256_Init|, \lstinline|Skein_256_Update|, and
\lstinline|Skein_256_Final| produce a run-time error when
they are used, in this order, to initialize a local variable
of type \lstinline|Skein_256_Ctxt_t| (with argument 64 for the size),
parse an arbitrary message and produce a hash in a local \lstinline|u08b_t|
array of size~8.
\subsection{Verifying functional dependencies}
If we had written a formal specification for function Skein, we would soon
have expressed that we expected it to modify the buffer \lstinline|hash|
that is passed to \lstinline|Skein_256_Final| (all 8 bytes of the buffer),
to compute the new contents of this buffer from the contents of the
input buffer \lstinline|msg| (all 80 bytes of it), and from nothing else.
During \Eva{}'s analysis, we have seen that all of the buffer \verb|hash|
was always modified in the conditions of the analysis: the reason is that
this buffer was uninitialized before the sequence of calls,
and guaranteed to be initialized after them.
We can get the complete list of locations that may be modified by each
function by adding the option \verb|-out| to the other options we were
already using. These locations are computed in a quick analysis that comes
after \Eva{}. In the results of this analysis, we find:
\begin{logs}
[inout] Out (internal) for function main:
Frama_C_entropy_source; msg[0..79]; i; hash[0..7]; skein_context; tmp
\end{logs}
The ``(internal)'' reminds us that this list includes the local variables
of \lstinline|main| that have been modified. Indeed, all the variables
that we could expect to appear in the list are here: the input buffer, that
we initialized to all possible arbitrary 80-char messages; the loop index
that we used in doing so; the output buffer for receiving the hash;
and Skein's internal state, that was indirectly modified by us when we called
the functions \lstinline|Init|, \lstinline|Update| and \lstinline|Final|.
If we want the outputs of the sequence to appear more clearly, without the
variables we used for instrumenting, we can put it in its own function:
\begin{listing}{1}
u08b_t hash[HASHLEN];
void do_Skein_256(void)
{
Skein_256_Ctxt_t skein_context;
Skein_256_Init( &skein_context, HASHLEN * 8);
Skein_256_Update( &skein_context, msg, 80);
Skein_256_Final( &skein_context, hash);
}
void main(void)
{
int i;
for (i=0; i<80; i++) msg[i]=Frama_C_interval(0, 255);
do_Skein_256();
}
\end{listing}
Using option \verb|-out-external| in order to obtain lists of
locations that exclude each function's local variables, we get:
\begin{logs}
[inout] Out (external) for function do_Skein_256:
hash[0..7]
\end{logs}
This means that no location other than \lstinline|hash[0..7]| was
modified by the sequence of calls to Skein-256 functions. It doesn't
mean that each of the cells of the array was overwritten: we have to
rely on the results of \Eva{} when \lstinline|hash| was a
local variable for that result. But it means that when used in
conformance with the pattern in this program, the functions do not
accidentally modify a global variable. We can conclude from this