Skip to content
Snippets Groups Projects
main.tex 226 KiB
Newer Older
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000
\documentclass[web]{frama-c-book}

\usepackage{microtype}
\usepackage{lmodern}
\usepackage{booktabs}

% Commandes pour mettre des ref dans l'index :
\newcommand{\idb}[1]{\textbf{#1}}
\newcommand{\indextxt}[1]{\index{#1}{\bf #1}}
\newcommand{\indexdef}[1]{\index{#1|idb}{{\bf #1}}}
\newcommand{\indextxtdef}[2]{\index{#2|idb}{{\bf #1}}}

\renewcommand{\labelitemii}{$\triangleright$}

\newcommand{\bropen}{\mbox{\tt [}}
\newcommand{\brclose}{\mbox{\tt ]}}
\newcommand{\cbopen}{\mbox{\tt \{}}
\newcommand{\cbclose}{\mbox{\tt \}}}

\newcommand{\framacversion}%
           {\input{../../VERSION} (\input{../../VERSION_CODENAME}\unskip)}

\newcommand{\Eva}{\textsf{Eva}}

\lstset{literate=%
{}{{$\in$ }}1
}

% The following definitions are used to differentiate between
% the open-source manual and the closed-source one.
% Source: http://tex.stackexchange.com/questions/62010
% Note: does NOT work with xetex/luatex.
% Also, note that some weird situations may arrive inside
% \ifdefstring blocks, e.g. \lstinline blocks require escaping
% backslashes, otherwise this file will not parse.
\usepackage{etoolbox}
\usepackage{catchfile}
\newcommand{\getenv}[2][]{%
  \CatchFileEdef{\temp}{"|kpsewhich --var-value #2"}{\endlinechar=-1\relax}%
  \if\relax\detokenize{#1}\relax\temp\else\edef#1{\temp}\fi}

% define \OPENSOURCE to the value of the OPENSOURCE environment variable
\getenv[\OPENSOURCE]{OPENSOURCE}

%==============================================================================
\begin{document}

\coverpage{Eva – The Evolved Value Analysis plug-in}

\begin{titlepage}
\begin{flushleft}
\includegraphics[height=14mm]{cealistlogo.jpg}
\end{flushleft}
\vfill
\title{The \Eva{} plug-in}{\framacversion}
\author{David Bühler, Pascal Cuoq and Boris Yakobowski. \\
  With Matthieu Lemerre, André Maroneze,
  Valentin Perrelle and Virgile Prevosto}
\begin{tabular}{l}
CEA LIST, Software Reliability Laboratory, Saclay, F-91191 \\
\end{tabular}
\vfill
\begin{flushleft}
  \textcopyright 2011-\the\year{} CEA LIST

  This work has been supported by the ANR project U3CAT
  (ANR-08-SEGI-021-01).
\end{flushleft}
\end{titlepage}

\tableofcontents

\chapter{Introduction}\label{introduction}
\vspace{2cm}

{\em Frama-C is a modular static analysis framework for the C language.\\
This manual documents the \Eva{} (for \emph{Evolved Value Analysis}) plug-in of Frama-C.}
\vspace{2cm}

The \Eva{} plug-in automatically computes sets of possible
values for the variables of an analyzed program. \Eva{} 
warns about possible run-time errors in the analyzed program.
Lastly, synthetic information
about each analyzed function can be computed automatically from the
values provided by \Eva{}: these functionalities (input
variables, output variables, and information flow) are also documented
here.  \bigskip

The framework, the \Eva{} plug-in and the other plug-ins 
documented here are all Open Source software. They can be
downloaded from \url{http://frama-c.com/}.
\bigskip

In technical terms, \Eva{} is context-sensitive and
path-sensitive. In non-technical terms, this means that it tries to
offer precise results for a large set of C programs. Heterogeneous
pointer casts, function pointers, and floating-point computations are
handled.

\section{First contact}
Frama-C comes with two interfaces: batch and interactive.
The interactive graphical interface of Frama-C displays
a normalized version of the analyzed source code.
In this interface, the \Eva{} plug-in allows the user
to select an expression in the
code and observe an over-approximation of the set of 
values this expression can take at run-time. 

Here is a simple C example:
\listinginputcaption{1}{\texttt{introduction.c}}{introduction.c}

If either interface of Frama-C is launched with options
\lstinline|-eva introduction.c|,
the \Eva{} plug-in is able to guarantee that at each passage through 
the \lstinline|return| statement of 
function \lstinline|f|, the global variables \lstinline+y+ and
\lstinline+z+ each contain either 1 or 3. 
At the end of function  \lstinline|main|,
it indicates that  \lstinline|y|
necessarily contains  4, and the value of  \lstinline|z| is again 1 or 3.

When the plug-in indicates the value of \lstinline|y| is 1 or 3
at the end of function \lstinline|f|, it implicitly computes the union
of all the values in \lstinline|y|
at each passage through this program point throughout any execution.
In an actual execution of this deterministic program,
there is only one passage though the end
of function  \lstinline|main|, and therefore only one value for \lstinline|z|
at this point.
The answer given by \Eva{} is approximated but correct (the actual
value, 3, is among the proposed values).

The theoretical framework on which \Eva{} is founded is called
Abstract Interpretation and has been the subject of extensive research
during the last forty years.

\section{Run-time errors and the absence thereof}

An analyzed application may contain run-time errors: divisions by zero,
invalid pointer accesses,\ldots

\listinginputcaption{1}{\texttt{rte.c}}{rte.c}

When launched with \lstinline|frama-c -eva rte.c|,
\Eva{} emits a warning about an out-of-bound access at line 5:
\begin{logs}
rte.c:5:[kernel] warning: accessing out of bounds index. assert i < 10;
\end{logs}
There is in fact an out-of-bounds access at this line in the program.
It can also happen that, because of approximations during its computations,
Frama-C emits warnings for constructs
that do not cause any run-time errors. These are called ``false alarms''.
On the other hand, the fact that \Eva{}
computes correct, over-approximated sets of possible values prevents it
from remaining silent on a program that contains a run-time error. For 
instance, a particular division in the analyzed program
is the object of a warning 
as soon as the set of possible values for the divisor
contains zero. Only
if the set of possible values computed by \Eva{}
does not contain zero is the warning omitted, and
that means that the divisor really cannot be null at run-time.

\section{From Value Analysis to \Eva{}}

The \Eva{} plug-in was previously called the \emph{Value Analysis plug-in}.
Following major changes in its expressivity and overall precision,
the plugin was subsequently renamed to \emph{Evolved Value Analysis}, or
\Eva{} for short. Those changes were first made available with the
Aluminium version of Frama-C. They are presented in section~\ref{sec:eva}.

\section{Other analyses based on \Eva{}}

Frama-C also provides
synthetic information on the behavior of analyzed functions:
inputs, outputs, and dependencies. This information is computed
from the results of the \Eva{} plug-in,
and therefore some familiarity with \Eva{}
is necessary to get the most of these computations.

\chapter{Tutorial}\label{tutorial}
\vspace{2cm}

{\em Some use cases for \Eva{}\ldots}

\vspace{2cm}

\section{Introduction}

Throughout this tutorial, we will see on a single example how to use
\Eva{} for the following tasks :
\begin{enumerate}
\item to get familiar with foreign code,
\item to produce documentation automatically,
\item to search for bugs, 
\item to guarantee the absence of bugs.
\end{enumerate}
It is useful to stick to a single example in this tutorial, and
there is a natural progression to the list of results above,
but in real life, a single person would generally focus on
only one or two of these four tasks for a given
codebase. For instance, if you need Frama-C's help to reverse
engineer the code as in tasks 1 and 2, you have
probably not been provided with the quantity of documentation
and specifications
that is appropriate for meaningfully carrying out task 4.

Frama-C helps you to achieve tasks 1-3 in less time than you would
need to get the same results using the traditional approach of writing
tests cases. Task 4, formally guaranteeing the absence of bugs, can in
the strictest sense not be achieved at all using tests for two
reasons.  Firstly, many forms of bugs that occur in a C program
(including buffer overflows) may cause the behavior of the program to
be non-deterministic.  As a consequence, even when a test suite comes
across a bug, the faulty program may appear to work correctly during
tests and fail later, after it has been deployed.  Secondly, the
notion of coverage of a test suite in itself is an invention made
necessary because tests aren't usually exhaustive.  Assume a
function's only inputs are two 32-bit integers, each allowed to take
the full range of their possible values.  Further assume that this
function only takes a billionth of a second to run (a couple of cycles
on a 2GHz processor).  A pencil and the back of an envelope show that
this function would take 600 years to test exhaustively.  Testing
coverage criteria can help decide a test suite is ``good enough'', but
there is an implicit assumption in this reasoning.  The assumption is
that a test suite that satisfies the coverage criteria finds all the
bugs that need to be found, and this assumption is justified
empirically only.

\section{Target code: Skein-256}

A single piece of code, the reference implementation for the Skein
hash function, is used as an example throughout this document. As of
this writing, this implementation is available at
\url{http://www.schneier.com/code/skein.zip}. The Skein function is
Niels Ferguson, Stefan Lucks, Bruce Schneier, Doug Whiting, Mihir
Bellare, Tadayoshi Kohno, Jon Callas, and Jesse Walker's entry in the
NIST cryptographic hash algorithm competition for SHA-3.

Cryptographic hash functions are one of the basic building blocks from
which cryptographic protocols are designed. Many cryptographic
protocols are designed with the assumption that an implementation for
a function $h$ from strings to fixed-width integers is available with
the following two properties:
\begin{itemize}
\item it is difficult to find two distinct, arbitrary strings $s_1$ and $s_2$
such that $h(s_1)=h(s_2)$,
\item for a given integer $i$, it is difficult to build a string $s$
such that $h(s)=i$.
\end{itemize}
A function with the above two properties is called a cryptographic hash
function. It is of the utmost importance that the function actually
chosen in the implementation of a cryptographic application satisfies the
above properties! Any weakness in the hash function can be
exploited to corrupt the application.

Proofs of security for
cryptographic hash functions are complicated mathematical affairs.
These proofs are made using a mathematical definition
of the function. Using static analysis for verifying that a piece
of code corresponds to the mathematical model
which was the object of the security proofs is an interesting
topic but is outside the scope of this tutorial. 
In this document, we will not  be using \Eva{} for
verifying cryptographic properties.

We will, however, use \Eva{} to familiarize ourselves
with Skein's reference implementation, and we will see that
it can be a more useful tool than, say, a C
compiler, to this effect. We will also use \Eva{} to look
for bugs in the implementation of Skein, and finally prove that the
functions that implement Skein may never cause a run-time error for a
general use pattern. Because of the numerous pitfalls of the C
programming language, any amount of work at the 
mathematical level cannot exclude the possibility of problems
such as buffer overflows in the implementation.
It is a good thing to be able to rule these out with \Eva{}.

\Eva{} is most useful for embedded or embedded-like
code. Although the Skein library does not exactly fall in this category,
it does not demand dynamic allocation and uses few functions from external
libraries, so it is well suited to this analysis.

\section{Using \Eva{} for getting familiar with Skein}

\subsection{Writing a test}

After extracting the archive \lstinline|skein_NIST_CD_010509.zip|,
listing the files in\\
\lstinline|NIST/CD/Reference_Implementation| shows:
\begin{listing}{1}
-rw-r--r--  1 pascal  501   4984 Oct 14 00:52 SHA3api_ref.c
-rw-r--r--  1 pascal  501   2001 Oct 14 00:54 SHA3api_ref.h
-rw-r--r--  1 pascal  501   6141 Sep 30 00:28 brg_endian.h
-rw-r--r--  1 pascal  501   6921 May 17  2008 brg_types.h
-rw-r--r--  1 pascal  501  34990 Jan  6 01:39 skein.c
-rw-r--r--  1 pascal  501  16290 Nov  9 05:48 skein.h
-rw-r--r--  1 pascal  501  18548 Oct  7 20:02 skein_block.c
-rw-r--r--  1 pascal  501   7807 Oct 10 02:47 skein_debug.c
-rw-r--r--  1 pascal  501   2646 Oct  9 23:44 skein_debug.h
-rw-r--r--  1 pascal  501   1688 Jul  3  2008 skein_port.h
\end{listing}

The most natural place to go next is the file \lstinline|skein.h|,
since its name hints that this is the header with the
function declarations that should be called from outside the library.
Scanning the file quickly, we may notice declarations such as
\begin{listing-nonumber}
typedef struct /*  256-bit Skein hash context structure */
    {
      [...]
    } Skein_256_Ctxt_t;

/*   Skein APIs for (incremental) "straight hashing" */
int  Skein_256_Init  (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
[...]
int  Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg, 
                                                 size_t msgByteCnt);
[...]
int  Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
\end{listing-nonumber}

The impression we get at first glance is that the hash
of an 80-char message containing 
the string ``People of Earth, your attention, please''
can be computed as simply as declaring a variable of
type \lstinline|Skein_256_Ctxt_t|, letting \lstinline|Skein_256_Init|
initialize it, passing \lstinline|Skein_256_Update| a representation
of the string, and calling \lstinline|Skein_256_Final| with the
address of a buffer where to write the hash value.
Let us write a C program that does just that:
\listinginputcaption{1}{\texttt{main\_1.c}}{tutorial/main_1.c}

In order to make the test useful, we have to print
the obtained hash value. Because the result we are interested in is
an 8-byte number, represented as a char array of 
arbitrary characters (some of them non-printable), we cannot
use string-printing functions, hence the \lstinline|for| loop at lines 17-18.
\goodbreak

With luck, the compilation goes smoothly and we obtain
the hash value:
\begin{shell}
gcc *.c
./a.out
\end{shell}

\begin{logs}
215
215
189
207
196
124
124
13
\end{logs}

\subsection{First try at an analysis using \Eva{}}

Let us now see how \Eva{} works on the same example.
\Eva{} can be launched with the following command. 
The analysis should not take more than a few seconds:
\begin{listing-nonumber}
frama-c -eva *.c >log
\end{listing-nonumber}

Frama-C has its own model of the target platform (the default target is
a little-endian 32-bit platform). It also uses the host system's preprocessor.
If you want to do the analysis for a different platform than the host platform,
you need to provide Frama-C with a way to pre-process the files as they
would be during an actual compilation. 

There are analyses where the influence of host platform parameters
is not noticeable. The analysis we are embarking on is not one of them.
If you pre-process the Skein source code with a 64-bit compiler and
then analyze it with Frama-C targeting its default 32-bit platform,
the analysis will be meaningless and you won't be able to make sense of
this tutorial.
If you are using a 64-bit compiler, simply match
Frama-C's target platform with your host header files by systematically adding 
the option \verb|-machdep x86_64| to all commands in
this tutorial.

The ``\lstinline|>log|'' part of the command sends all
the messages emitted by Frama-C into a file named
``\lstinline|log|''. \Eva{} is verbose for a number of reasons that
will become clearer later in this tutorial.
The best way to make sense of the information 
produced by the analysis is
to send it to a log file.

There is also a Graphical User Interface 
for creating analysis projects and
visualizing the results. 
The GUI is still evolving quickly at this stage 
and it would not make much sense to
describe its manipulation in detail in this document.
You are however encouraged to try it if it is available on your
platform.
One way to create an analysis project in the GUI
is to pass the command \lstinline|frama-c-gui| the same options
that would be passed to \lstinline|frama-c|. In this first example,
the command \lstinline|frama-c-gui -eva *.c| launches
the same analysis and then opens the GUI for inspection of the results.

The initial syntactic analysis and symbolic link phases
of Frama-C may find issues such as
inconsistent types for the same symbol in different files.
Because of the way separate compilation is implemented,
the issues are not detected by standard C compilers.
It is a good idea to check for these issues in the first
lines of the log.

\subsection{Missing functions}

Since we are trying out the library for the first time, 
something else to look for in the log file is the list of functions for
which the source code is missing. \Eva{} requires
either a body or a specification for each function it analyses.

\begin{listing-nonumber}
grep "Neither code nor specification" log
\end{listing-nonumber}

This should match lines indicating functions that are both undefined
(without source) and that have no specification in Frama-C's standard library.
In this example, no functions will be output, since Frama-C's standard library
already has prototypes for the most important library functions.

But suppose you would have a custom printing
function, say \verb|print| instead of \verb|printf|, or some other function
with missing source code. For instance, change the code in \verb|main_1.c|,
replacing \verb|printf| with \verb|print|, re-run \Eva{},
and then grep the log again. You will obtain this output:

\begin{logs}
main_1.c:14:[kernel] warning: Neither code nor specification for function print,
  generating default assigns from the prototype
\end{logs}

The warning indicates that \Eva{} is trying to infer an
{\em assigns} clause for the function. This will be explained in detail later
(\ref{annot_assigns}), but for now the important part is that \Eva{}
needs either source code or an ACSL specification in order to work.
Without those, it can only guess what each function does from the function's
prototype, which is both inaccurate and likely incorrect.

Once we are convinced that all functions have either a body or a specification,
we can focus on functions without bodies:
\begin{listing-nonumber}
grep "using specification for function" log
[eva] using specification for function memset
[eva] using specification for function memcpy
[eva] using specification for function printf
\end{listing-nonumber}
We find three functions. Not having a body for \verb|printf| is fine,
since calls to this function have no observable effects for the analysis.
Things are not so clear-cut for
\verb|memcpy| and \verb|memset|, for which we will provide bodies later.

It is also possible to obtain a list of
missing function definitions by using the command:
\begin{listing-nonumber}
frama-c -metrics -metrics-libc *.c
\end{listing-nonumber}
This command computes, among other pieces of information,
a list of missing functions using
a syntactic analysis. It is not exactly equivalent to
grepping the log of \Eva{} because it lists
all the functions that are missing, while the log of
\Eva{} only cites the functions that would have
been necessary to an analysis. When
analyzing a small part of a large codebase, the latter list
may be much shorter than the former. 
In this case, relying on the information displayed by \lstinline|-metrics|
means spending time hunting for functions that are not actually
necessary.


The output of \verb|frama-c -metrics -metrics-libc *.c|
will then include the following lines:

\begin{logs}
Undefined functions (83)
=======================
[...]
 gets (0 call); memchr (0 call); memcmp (0 call); memcpy (21 calls);
 memmove (0 call); memset (27 calls); perror (0 call); printf (1 call)
\end{logs}
All functions included from \verb+string.h+ are undefined, but only
\verb+memcpy+, \verb+memset+ and \verb+printf+ are called.


To handle the absence of functions \lstinline|memset| and \lstinline|memcpy|%
\footnote{Since \FramaC 18 (Argon), \Eva{} includes builtin functions for
  \lstinline|memset| and \lstinline|memcpy|,
  thus providing source code for them is no longer necessary.},
we will provide source code for them.
Sometimes it is a good idea to provide the
exact source code that will actually be used in an actual
running environment, so as to detect subtle bugs in the interactions
between the program and the system libraries. 
On the other hand, string manipulation functions often 
feature difficult-to-analyze 
architecture-dependent optimizations.
For these, it is better to provide a simpler but equivalent implementation.
In this particular case, let us provide these two
functions in a file that we will place with the others and 
name ``\lstinline|lib.c|''.
\listinginput{1}{tutorial/lib.c}

%% Some compilation platforms' headers define the names \lstinline|memset|
%% and \lstinline|memcpy| in ways that make it impossible to provide
%% your own implementation for these functions. If this happens for
%% yours, you can try placing your own header in the analysis directory,
%% under the name ``\lstinline|string.h|''
%% \listinginput{1}{tutorial/string.h}

%% In some circumstances, the results of \Eva{} may
%% remain useful even when letting Frama-C guess the effects
%% of missing functions. In the case of functions that take
%% pointers as arguments, however, the possibilities are too numerous
%% for the analyzer to be able to guess right.
%% In this example, the function \lstinline|printf| is safe to
%% leave without an implementation, because 
%% unlike the other two,
%% it does not have effects on the variables of the program.

%. It is generally far simpler to provide
%either the actual code or a model written in C
%for the missing functions.

For some functions -- including those two -- an implementation is
already provided in \FramaC. It can be found in the \lstinline|.c|
files of the directory \lstinline|share/frama-c/libc|, usually within
a file identically named to the header declaring the functions.  For
\lstinline|memcpy| and \lstinline|memset|, the implementation can thus
be found in \lstinline|string.c|, and adding it to the list of files
to be analyzed would make the warning disappear.
%
However, we are going to assume that such implementations are not available
in the remainder of this tutorial.  Since quite a few functions do not have
any implementation in \FramaC's \lstinline|libc|, this allows us to show how
to deal with the general case.

\subsection{First meaningful analysis}

If we run the analysis again with definitions for \lstinline|memset|
and \lstinline|memcpy|, the newly obtained log no longer mentions
any missing function other than \lstinline|printf|\footnote{It is
a good idea to
check again for missing functions because new execution paths
could have been activated by the implementations of functions 
that were previously missing.}.

The log next contains a description of the initial values of
variables.  Apart from the variables that we defined ourselves, there
is a rather strange one, named \lstinline|ONE| and indeed containing
\lstinline|1|. Searching the source code reveals that this variable is
in fact declared \lstinline|static| as it should be.  Frama-C can
display the value of static variables (something that is annoying to
do when using a C compiler for testing).  Frama-C may rename
static variables in order to distinguish them from another variable
with the same name.  The GUI displays the original source code
alongside the transformed one.

A quick inspection shows that the Skein functions
use variable \lstinline|ONE| to detect endianness.
Frama-C assumes a little-endian architecture
by default, so \Eva{} is only analyzing the
little-endian version of the library (Frama-C
also assumes an IA-32 architecture, so we are only
analyzing the library as compiled and run on this architecture).
The big-endian version of the library could be analyzed
by reproducing the same steps we are taking here for
a big-endian configuration of Frama-C.

You will also see lots of variable names
prefixed by \verb|__fc_|, \verb|__FC_| and \verb|S___fc_|.
These are all variables coming from ACSL specifications in the Frama-C
standard library.

The next line in the log is a progression message, that indicates
that \Eva{} has encountered a loop and is performing an approximations.
Those messages can be ignored for now.
\begin{logs}
lib.c:17:[eva] entering loop for the first time
\end{logs}

The next log entry which is not a progression message is a
warning concerning variable \lstinline|tmp|
\begin{logs}
lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);
                                       (tmp from ptr++)
\end{logs}

There is no variable \lstinline|tmp| in our file \lstinline|lib.c|,
but as indicated in the message, this variable is introduced by the source
code normalization of the \verb|ptr++| expression.
You can also see the relation between this variable
and the original source code in the GUI.
Alternately, the command \lstinline|frama-c *.c -print|
shows how the original source code has been
transformed by Frama-C (note, however, that the location
\lstinline|lib.c:18| refers to the original source file).
Here, it is the statement \lstinline|*ptr++ = val;| in the
function \lstinline|memset| that has been transformed by
Frama-C into the sequence below, so that it would not
be ambiguous which value of \lstinline|ptr| 
has to be a valid pointer.
\begin{listing-nonumber}
  tmp = ptr;
  ptr ++;
  *tmp = (unsigned char)val;
\end{listing-nonumber}

The command \lstinline|frama-c *.c -eva -print| launches
\Eva{} and then prints the transformed source code,
annotated with the
alarms raised by \Eva{}. In our case, the
function \lstinline|memset| is transformed in
\begin{listing-nonumber}
void *memset(void *dest, int val, size_t len)
{
  unsigned char *ptr;
  ptr = (unsigned char *)dest;
  while (1) {
    size_t tmp_0;
    unsigned char *tmp;
    { /* sequence */
      tmp_0 = len;
      len --;
      ;
    }
    if (! (tmp_0 > (size_t)0)) break;
    { /* sequence */
      tmp = ptr;
      ptr ++;
      /*@ assert Value: mem_access: \valid(tmp); */
      *tmp = (unsigned char)val;
    }
  }
  return dest;
}
\end{listing-nonumber}

As we will find out later in this tutorial, this alarm is a false
alarm, an indication of a potential problem in a place where there is
in fact none. On the other hand, \Eva{} never remains
silent when a risk of run-time error is present, unlike many other
static analysis tools. The important consequence is that if you get
the tool to remain silent, you have {\em formally verified} that
no run-time error could happen when running the program.
 
It may seem that the access to \lstinline|*tmp| is the
only dangerous operation in the function \lstinline|memset|,
and therefore that the analyzer is not doing a very good job
of pointing only the operations that are problematic.
While it is true that the results obtained on this example
are not very precise, the comparison
\lstinline|>| is also considered dangerous 
by the analyzer. This operation may be unspecified when applied
to an invalid pointer. Consider a program that does the
following:
\begin{listing}{1}
int main()
{
  int a;
  return ((unsigned int)(&a - 1432)) > 0;
}
\end{listing}
The programmer might misguidedly have written the \lstinline|>0|
test in this example as a way to test for \lstinline|NULL|.
Ey\footnote{This manual uses Spivak pronouns:
\url{http://en.wikipedia.org/wiki/Spivak_pronoun}} might 
expect this program always to return 1, since
\lstinline|(&a - 1432)| is not \lstinline|NULL|. But this
program may in fact return 0 if, out of bad luck,
the compiler places variable \lstinline|a| precisely at address
1432. This kind of bug would be very
difficult to find and/or reproduce by testing.

The analyzer does not emit any alarm for the comparison
\lstinline|tmp_0 > (size_t )0| in the normalized source code
for \lstinline|memset|, and
this means that it guarantees\footnote{The fine print is
the reference part of this manual} that 
this particular comparison is never non-deterministic
because of an issue such as the one just described.
This conclusion can only be reached by looking at 
the arguments actually passed to the function \lstinline|memset|.
Therefore, this conclusion is only valid for all the
possible executions coming from the \lstinline|main|
function that we provided, and not for all the possible
calls to function \lstinline|memset| that a programmer
could write.

One should not spend too much time at this stage trying to
determine if the dozens or so alarms emitted by the analyzer
are true alarms that indicate an actual problem or false alarms
that don't.
The most important information is that the analysis did not
take an unreasonable time.
% The second most important information is that the analysis
% seems to have explored all the paths we intended for it to
% explore, as seen in the list of functions for which values are
% printed or in the log entries such as:
% \begin{logs}
% [value] computing for function RotL_64 <- Skein_256_Process_Block
%                       <- Skein_256_Final <- main.
%         Called from skein_block.c:100.
% \end{logs}

The GUI allows to inspect the sets of values 
obtained during the analysis and to get
a feeling of how it works. Right-clicking on the function name
in a function call brings a contextual menu that allows to go
to the function's definition and to inspect it in turn. 
In order to return to the caller,
right-clicking on the name of the current function at the top
of the normalized code buffer brings a contextual menu
with a list of callers.
You can also use the Back button (left arrow) in the toolbar.

\section{Searching for bugs}

\subsection{Increasing the precision with option {\tt -eva-slevel}}

Because we compiled and executed the same program that we are
now analyzing, we are confident that most of the alarms
displayed by \Eva{} are false alarms that do not
correspond to actual problems. In fact, because the program
is deterministic and takes no inputs,
only one of the alarms can be a true alarm in
the strictest sense. The analysis stops when an error is
encountered (what would it even mean to continue the analysis
after, for instance, \lstinline|NULL| has been dereferenced?).
It is only because \Eva{} is uncertain about the errors
encountered here that it continues and finds more possible
errors.

Before we spend any of our time looking at each of these alarms,
trying to determine whether it is true or false,
it is a good idea to make the analyzer spend more of its time
trying to determine whether each alarm is true or false.
There are different settings that influence the compromise
between precision and resource consumption.
If you chose to remember only one of these settings, it would
have to be the option \lstinline|-eva-slevel|. This option has
two visible effects:
it makes the analyzer unroll loops, and it makes it propagate
separately the states that come from the \lstinline|then|
and \lstinline|else| branches of a conditional statement.
This makes the analysis more precise (at the cost of being
slower) for almost every program that can be analyzed.

\Eva{} has different ways to provide information on 
the loss of precision that causes the false alarms. 
This is another reason why it is so verbose during an analysis:
it tries to provide enough information for a motivated user
to be able to understand what is happening during the analysis,
where the imprecise values that cause the false alarms came from,
and what instructions in the program made them imprecise.

But instead of spending precious human time  making use
of the provided information, the brute-force approach of blindly
applying the \lstinline|-eva-slevel| option must be tried first.
Indeed, \lstinline|-eva-slevel| is one of the options of \Eva{}
that can never cause it to become incorrect. Such options,
when used wrongly, may make the analysis slower, 
provide hints that are not
useful, or even possibly counter-productive (making the analysis
less precise), but they cannot
prevent the analyzer from reporting a problem where there is one.

The \lstinline|-eva-slevel| option is useful for unrolling loops.  We do
not know without further inspection how many iterations the loops
inside the program need, but the input message is 80 characters long,
and we can assume that it is read inside a loop, so using the option
\lstinline|-eva-slevel 100| should have a good chance of unrolling at
least that loop. Furthermore, it does not make the analysis slower to
use a number greater than the value actually necessary to unroll
every loop and conditional in the program. We are limiting ourselves
to 100 in this first try because we do not know how much time the
analysis will take with this number of unrolled branches.  If, with
the parameter \lstinline|-eva-slevel 100|, the precision of the analysis still isn't
satisfactory and the time spent by the analyzer remains reasonable, we
can always try a higher value. This should be progressive, because a value
higher than a few hundreds, when there really are that many branches to
unroll in the program, can make the analysis very slow.

\begin{shell}
frama-c -eva-slevel 100 -eva *.c >log 
\end{shell}

The analysis goes rather far without finding any alarm,
but when it is almost done (after the analysis of function
\lstinline|Skein_256_Final|), it produces:
\begin{logs}
...
main_1.c:14:[kernel] warning: Neither code nor specification for function printf, 
            generating default assigns from the prototype
[eva] using specification for function printf
[eva] Done for function printf
main_1.c:14:[eva] completely invalid value in evaluation of
                  argument (int)hash[i]
main_1.c:14:[kernel] warning: accessing uninitialized left-value.
                     assert \initialized(&hash[i]);
[eva] Recording results for main
[eva] done for function main
...
\end{logs}

These messages mean that lvalue \lstinline|hash[i]|, as found
in line 14 of our file \lstinline|main_1.c|, may be
uninitialized\footnote{An lvalue is a C expression that exists in memory.
Examples of lvalues are a simple variable {\tt x}, the cell of an
array {\tt t[i]}, or a pointed location {\tt *p}. Because an lvalue
lies in memory, it may have uninitialized contents.
Using an
lvalue when it has not yet been initialized is a forbidden
behavior that compilers do not detect in all cases, and this
is only one of the numerous pitfalls of C.}. 
Actually, the analyzer finds that for one execution
branch, it is certain to be uninitialized. This is when reading
the memory location ``\lstinline|hash[i]|'', one of the arguments given
to the \lstinline|printf| function.
Extra information can be observed using the GUI, by clicking on the 
\lstinline|hash| expression in line 14 and reading the \textsf{Values} tab,
which displays the following values for \lstinline|hash|:
\begin{logs}
  [0] $\in$ {200}
  [1..7] $\in$ UNINITIALIZED
\end{logs}
This means that element \lstinline|hash[0]| has value 200, but elements 1 to 7
are definitely uninitialized. Therefore the first iteration of the loop
proceeds as expected, but on the second iteration, the analysis of the branch
is stopped, because
from the point of view of the analyzer reading an uninitialized
value should be a fatal error. And there are no other execution
branches that reach the end of this loop, which is why the analyzer
finds that function \lstinline|main| does not terminate.

The next warning, \lstinline|accessing uninitialized left-value...|
is emitted whenever an accessed value may be uninitialized, and the emitted
assertion, \lstinline|\initialized(&hash[i])|, could not be verified by
\Eva{}, and must therefore be verified by other means
(eventually, other Frama-C analyses) to ensure the behavior of the code is
defined.

\subsection{Making sense of alarms}

There were three alarms in the analysis. The other two are related to
interactions between
the Frama-C standard library and our own implementations of \verb|memset|/
\verb|memcpy|\footnote{Namely, the fact that their specifications involve
{\em logic predicates} that are not yet understood by \Eva{} leads
to an ``Unknown'' status, even though they are valid in this case.},
but we can safely ignore them here.


%% There are three alarms in the analysis we now have, and two of them are related
%% to the (lack of) \lstinline|printf| specification, because we did not include
%% that of the Frama-C standard library. Because \lstinline|printf| does not
%% modify the values of any variables we are interested in, and we do not want to
%% provide it with some code, we can simply add a minimalistic ACSL specification
%% to the \lstinline|printf| prototype in line 3:

%% \begin{listing}{3}
%% /*@ assigns \result \from \nothing; */
%% int printf(const char*,...);
%% \end{listing}

%% This is incorrect in the general case (since it claims that the result of
%% \lstinline|printf| is a constant value, independent of its arguments),
%% but since we ignore its result in our code anyway, this is fine here.
%% This is by the way equivalent to the default behavior that was previously
%% inferred by \Eva{}, and the reason why it emitted warnings:
%% it is not a correct specification in the general case.

Furthermore, the program
is also found not to terminate. This means that every execution
either encounters the problem from line 14 in \lstinline|main_1.c|,
or an infinite loop somewhere else in the program.
We should, therefore, pay particular attention to this alarm.

Looking again at the header file \lstinline|skein.h|, we may now
notice that in function \lstinline|Skein_256_Init|,
the formal parameter for the desired hash length is named
\lstinline|hashBitLen|. This parameter should certainly be expressed in bits!
We were inadvertently asking for a 1-char hash of the message
since the beginning, and the test that we ran as our first
step failed to notice it.
Our first imprecise analysis  did find it -- the same alarm that
pointed out the problem is present among the other alarms produced
by the first analysis. Note that because of the imprecisions,
the first analysis was not able to conclude that the uninitialized access
at \lstinline|hash[2]| was certain, making it much less noticeable
among the others.

The bug can be fixed by passing
8*HASHLEN instead of HASHLEN as the second argument of 
\lstinline|Skein_256_Init|.
With this fix in place, the analysis
with \lstinline|-eva-slevel 100| produces no alarms (apart from the postconditions
of \verb+memcpy+ and \verb+memset+) and gives the following result:
\begin{logs}
Values for function main:
  hash[0] $\in$ {40}
      [1] $\in$ {234}
      [2] $\in$ {138}
      [3] $\in$ {230}
      [4] $\in$ {134}
      [5] $\in$ {120}
      [6] $\in$ {37}
      [7] $\in$ {35}
  i $\in$ {8}
\end{logs}

Meanwhile, compiling and executing the fixed test produces the result:
\begin{logs}
40
234
138
230
134
120
37
35
\end{logs}

\section{Guaranteeing the absence of bugs}

\subsection{Generalizing the analysis to arbitrary messages of fixed length}

The analysis we have done so far is very satisfying because
it finds problems that are not detected by a C compiler or by testing.
The results of this analysis only prove 
the absence of run-time errors\footnote{and the absence
of conditions that
{\em should} be run-time errors -- like the uninitialized access already
encountered} when the particular message that we
chose is being hashed, though.
It would be much more useful to have the assurance that there are
no run-time errors for any input message, especially since the library
might be under consideration for embedding in a device where anyone
(possibly a malicious user) will be able to choose the message
to hash.

A first generalization of the previous analysis  is to include
in the subject matter
the hashing of all possible 80-character messages. We can do this
by separating the analyzed program in two distinct phases, the first
one being the construction of a generalized analysis context
and the second one being made of the sequence of function calls
that we wish to study:

\listinginputcaption{1}{\texttt{main\_2.c}}{tutorial/main_2.c}

From this point onward the program is no longer executable because
of the call to builtin primitives such as \lstinline|Frama_C_interval|.
We therefore dispense with the final calls to \lstinline|printf|,
since \Eva{} offers simpler ways to observe intermediate
and final results. Note that we included \verb|__fc_builtin.h|, a file
that comes from the Frama-C distribution and which 
defines \lstinline|Frama_C_interval|. Running Frama-C on this
file (without \verb|main_1.c| in the same directory, to avoid having two
definitions for \verb|main|):
\begin{shell}
frama-c -eva-slevel 100 -eva *.c >log 2>&1
\end{shell}

This time, the absence of actual alarms is starting to be really interesting:
it means that it is formally excluded that the functions
\lstinline|Skein_256_Init|, \lstinline|Skein_256_Update|, and
\lstinline|Skein_256_Final| produce a run-time error when
they are used, in this order, to initialize a local variable
of type \lstinline|Skein_256_Ctxt_t| (with argument 64 for the size),
parse an arbitrary message and produce a hash in a local \lstinline|u08b_t|
array of size~8.

\subsection{Verifying functional dependencies}

If we had written a formal specification for function Skein, we would soon
have expressed that we expected it to modify the buffer \lstinline|hash|
that is passed to \lstinline|Skein_256_Final| (all 8 bytes of the buffer),
to compute the new contents of this buffer from the contents of the
input buffer \lstinline|msg| (all 80 bytes of it), and from nothing else.

During \Eva{}'s analysis, we have seen that all of the buffer \verb|hash|
was always modified in the conditions of the analysis: the reason is that
this buffer was uninitialized before the sequence of calls, 
and guaranteed to be initialized after them.

We can get the complete list of locations that may be modified by each
function by adding the option \verb|-out| to the other options we were 
already using. These locations are computed in a quick analysis that comes
after \Eva{}. In the results of this analysis, we find:
\begin{logs}