Newer
Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[web]{frama-c-book}
\usepackage{microtype}
\usepackage{lmodern}
\usepackage{booktabs}
% Commandes pour mettre des ref dans l'index :
\newcommand{\idb}[1]{\textbf{#1}}
\newcommand{\indextxt}[1]{\index{#1}{\bf #1}}
\newcommand{\indexdef}[1]{\index{#1|idb}{{\bf #1}}}
\newcommand{\indextxtdef}[2]{\index{#2|idb}{{\bf #1}}}
\renewcommand{\labelitemii}{$\triangleright$}
\newcommand{\bropen}{\mbox{\tt [}}
\newcommand{\brclose}{\mbox{\tt ]}}
\newcommand{\cbopen}{\mbox{\tt \{}}
\newcommand{\cbclose}{\mbox{\tt \}}}
\newcommand{\framacversion}%
{\input{../../VERSION} (\input{../../VERSION_CODENAME}\unskip)}
\newcommand{\Eva}{\textsf{Eva}}
\lstset{literate=%
{∈}{{$\in$ }}1
}
% The following definitions are used to differentiate between
% the open-source manual and the closed-source one.
% Source: http://tex.stackexchange.com/questions/62010
% Note: does NOT work with xetex/luatex.
% Also, note that some weird situations may arrive inside
% \ifdefstring blocks, e.g. \lstinline blocks require escaping
% backslashes, otherwise this file will not parse.
\usepackage{etoolbox}
\usepackage{catchfile}
\newcommand{\getenv}[2][]{%
\CatchFileEdef{\temp}{"|kpsewhich --var-value #2"}{\endlinechar=-1\relax}%
\if\relax\detokenize{#1}\relax\temp\else\edef#1{\temp}\fi}
% define \OPENSOURCE to the value of the OPENSOURCE environment variable
\getenv[\OPENSOURCE]{OPENSOURCE}
%==============================================================================
\begin{document}
\coverpage{Eva – The Evolved Value Analysis plug-in}
\begin{titlepage}
\begin{flushleft}
\includegraphics[height=14mm]{cealistlogo.jpg}
\end{flushleft}
\vfill
\title{The \Eva{} plug-in}{\framacversion}
\author{David Bühler, Pascal Cuoq and Boris Yakobowski. \\
With Matthieu Lemerre, André Maroneze,
Valentin Perrelle and Virgile Prevosto}
\begin{tabular}{l}
CEA LIST, Software Reliability Laboratory, Saclay, F-91191 \\
\end{tabular}
\vfill
\begin{flushleft}
\textcopyright 2011-\the\year{} CEA LIST
This work has been supported by the ANR project U3CAT
(ANR-08-SEGI-021-01).
\end{flushleft}
\end{titlepage}
\tableofcontents
\chapter{Introduction}\label{introduction}
\vspace{2cm}
{\em Frama-C is a modular static analysis framework for the C language.\\
This manual documents the \Eva{} (for \emph{Evolved Value Analysis}) plug-in of Frama-C.}
\vspace{2cm}
The \Eva{} plug-in automatically computes sets of possible
values for the variables of an analyzed program. \Eva{}
warns about possible run-time errors in the analyzed program.
Lastly, synthetic information
about each analyzed function can be computed automatically from the
values provided by \Eva{}: these functionalities (input
variables, output variables, and information flow) are also documented
here. \bigskip
The framework, the \Eva{} plug-in and the other plug-ins
documented here are all Open Source software. They can be
downloaded from \url{http://frama-c.com/}.
\bigskip
In technical terms, \Eva{} is context-sensitive and
path-sensitive. In non-technical terms, this means that it tries to
offer precise results for a large set of C programs. Heterogeneous
pointer casts, function pointers, and floating-point computations are
handled.
\section{First contact}
Frama-C comes with two interfaces: batch and interactive.
The interactive graphical interface of Frama-C displays
a normalized version of the analyzed source code.
In this interface, the \Eva{} plug-in allows the user
to select an expression in the
code and observe an over-approximation of the set of
values this expression can take at run-time.
Here is a simple C example:
\listinginputcaption{1}{\texttt{introduction.c}}{introduction.c}
If either interface of Frama-C is launched with options
\lstinline|-eva introduction.c|,
the \Eva{} plug-in is able to guarantee that at each passage through
the \lstinline|return| statement of
function \lstinline|f|, the global variables \lstinline+y+ and
\lstinline+z+ each contain either 1 or 3.
At the end of function \lstinline|main|,
it indicates that \lstinline|y|
necessarily contains 4, and the value of \lstinline|z| is again 1 or 3.
When the plug-in indicates the value of \lstinline|y| is 1 or 3
at the end of function \lstinline|f|, it implicitly computes the union
of all the values in \lstinline|y|
at each passage through this program point throughout any execution.
In an actual execution of this deterministic program,
there is only one passage though the end
of function \lstinline|main|, and therefore only one value for \lstinline|z|
at this point.
The answer given by \Eva{} is approximated but correct (the actual
value, 3, is among the proposed values).
The theoretical framework on which \Eva{} is founded is called
Abstract Interpretation and has been the subject of extensive research
during the last forty years.
\section{Run-time errors and the absence thereof}
An analyzed application may contain run-time errors: divisions by zero,
invalid pointer accesses,\ldots
\listinginputcaption{1}{\texttt{rte.c}}{rte.c}
When launched with \lstinline|frama-c -eva rte.c|,
\Eva{} emits a warning about an out-of-bound access at line 5:
\begin{logs}
rte.c:5:[kernel] warning: accessing out of bounds index. assert i < 10;
\end{logs}
There is in fact an out-of-bounds access at this line in the program.
It can also happen that, because of approximations during its computations,
Frama-C emits warnings for constructs
that do not cause any run-time errors. These are called ``false alarms''.
On the other hand, the fact that \Eva{}
computes correct, over-approximated sets of possible values prevents it
from remaining silent on a program that contains a run-time error. For
instance, a particular division in the analyzed program
is the object of a warning
as soon as the set of possible values for the divisor
contains zero. Only
if the set of possible values computed by \Eva{}
does not contain zero is the warning omitted, and
that means that the divisor really cannot be null at run-time.
\section{From Value Analysis to \Eva{}}
The \Eva{} plug-in was previously called the \emph{Value Analysis plug-in}.
Following major changes in its expressivity and overall precision,
the plugin was subsequently renamed to \emph{Evolved Value Analysis}, or
\Eva{} for short. Those changes were first made available with the
Aluminium version of Frama-C. They are presented in section~\ref{sec:eva}.
\section{Other analyses based on \Eva{}}
Frama-C also provides
synthetic information on the behavior of analyzed functions:
inputs, outputs, and dependencies. This information is computed
from the results of the \Eva{} plug-in,
and therefore some familiarity with \Eva{}
is necessary to get the most of these computations.
\chapter{Tutorial}\label{tutorial}
\vspace{2cm}
{\em Some use cases for \Eva{}\ldots}
\vspace{2cm}
\section{Introduction}
Throughout this tutorial, we will see on a single example how to use
\Eva{} for the following tasks :
\begin{enumerate}
\item to get familiar with foreign code,
\item to produce documentation automatically,
\item to search for bugs,
\item to guarantee the absence of bugs.
\end{enumerate}
It is useful to stick to a single example in this tutorial, and
there is a natural progression to the list of results above,
but in real life, a single person would generally focus on
only one or two of these four tasks for a given
codebase. For instance, if you need Frama-C's help to reverse
engineer the code as in tasks 1 and 2, you have
probably not been provided with the quantity of documentation
and specifications
that is appropriate for meaningfully carrying out task 4.
Frama-C helps you to achieve tasks 1-3 in less time than you would
need to get the same results using the traditional approach of writing
tests cases. Task 4, formally guaranteeing the absence of bugs, can in
the strictest sense not be achieved at all using tests for two
reasons. Firstly, many forms of bugs that occur in a C program
(including buffer overflows) may cause the behavior of the program to
be non-deterministic. As a consequence, even when a test suite comes
across a bug, the faulty program may appear to work correctly during
tests and fail later, after it has been deployed. Secondly, the
notion of coverage of a test suite in itself is an invention made
necessary because tests aren't usually exhaustive. Assume a
function's only inputs are two 32-bit integers, each allowed to take
the full range of their possible values. Further assume that this
function only takes a billionth of a second to run (a couple of cycles
on a 2GHz processor). A pencil and the back of an envelope show that
this function would take 600 years to test exhaustively. Testing
coverage criteria can help decide a test suite is ``good enough'', but
there is an implicit assumption in this reasoning. The assumption is
that a test suite that satisfies the coverage criteria finds all the
bugs that need to be found, and this assumption is justified
empirically only.
\section{Target code: Skein-256}
A single piece of code, the reference implementation for the Skein
hash function, is used as an example throughout this document. As of
this writing, this implementation is available at
\url{http://www.schneier.com/code/skein.zip}. The Skein function is
Niels Ferguson, Stefan Lucks, Bruce Schneier, Doug Whiting, Mihir
Bellare, Tadayoshi Kohno, Jon Callas, and Jesse Walker's entry in the
NIST cryptographic hash algorithm competition for SHA-3.
Cryptographic hash functions are one of the basic building blocks from
which cryptographic protocols are designed. Many cryptographic
protocols are designed with the assumption that an implementation for
a function $h$ from strings to fixed-width integers is available with
the following two properties:
\begin{itemize}
\item it is difficult to find two distinct, arbitrary strings $s_1$ and $s_2$
such that $h(s_1)=h(s_2)$,
\item for a given integer $i$, it is difficult to build a string $s$
such that $h(s)=i$.
\end{itemize}
A function with the above two properties is called a cryptographic hash
function. It is of the utmost importance that the function actually
chosen in the implementation of a cryptographic application satisfies the
above properties! Any weakness in the hash function can be
exploited to corrupt the application.
Proofs of security for
cryptographic hash functions are complicated mathematical affairs.
These proofs are made using a mathematical definition
of the function. Using static analysis for verifying that a piece
of code corresponds to the mathematical model
which was the object of the security proofs is an interesting
topic but is outside the scope of this tutorial.
In this document, we will not be using \Eva{} for
verifying cryptographic properties.
We will, however, use \Eva{} to familiarize ourselves
with Skein's reference implementation, and we will see that
it can be a more useful tool than, say, a C
compiler, to this effect. We will also use \Eva{} to look
for bugs in the implementation of Skein, and finally prove that the
functions that implement Skein may never cause a run-time error for a
general use pattern. Because of the numerous pitfalls of the C
programming language, any amount of work at the
mathematical level cannot exclude the possibility of problems
such as buffer overflows in the implementation.
It is a good thing to be able to rule these out with \Eva{}.
\Eva{} is most useful for embedded or embedded-like
code. Although the Skein library does not exactly fall in this category,
it does not demand dynamic allocation and uses few functions from external
libraries, so it is well suited to this analysis.
\section{Using \Eva{} for getting familiar with Skein}
\subsection{Writing a test}
After extracting the archive \lstinline|skein_NIST_CD_010509.zip|,
listing the files in\\
\lstinline|NIST/CD/Reference_Implementation| shows:
\begin{listing}{1}
-rw-r--r-- 1 pascal 501 4984 Oct 14 00:52 SHA3api_ref.c
-rw-r--r-- 1 pascal 501 2001 Oct 14 00:54 SHA3api_ref.h
-rw-r--r-- 1 pascal 501 6141 Sep 30 00:28 brg_endian.h
-rw-r--r-- 1 pascal 501 6921 May 17 2008 brg_types.h
-rw-r--r-- 1 pascal 501 34990 Jan 6 01:39 skein.c
-rw-r--r-- 1 pascal 501 16290 Nov 9 05:48 skein.h
-rw-r--r-- 1 pascal 501 18548 Oct 7 20:02 skein_block.c
-rw-r--r-- 1 pascal 501 7807 Oct 10 02:47 skein_debug.c
-rw-r--r-- 1 pascal 501 2646 Oct 9 23:44 skein_debug.h
-rw-r--r-- 1 pascal 501 1688 Jul 3 2008 skein_port.h
\end{listing}
The most natural place to go next is the file \lstinline|skein.h|,
since its name hints that this is the header with the
function declarations that should be called from outside the library.
Scanning the file quickly, we may notice declarations such as
\begin{listing-nonumber}
typedef struct /* 256-bit Skein hash context structure */
{
[...]
} Skein_256_Ctxt_t;
/* Skein APIs for (incremental) "straight hashing" */
int Skein_256_Init (Skein_256_Ctxt_t *ctx, size_t hashBitLen);
[...]
int Skein_256_Update(Skein_256_Ctxt_t *ctx, const u08b_t *msg,
size_t msgByteCnt);
[...]
int Skein_256_Final (Skein_256_Ctxt_t *ctx, u08b_t * hashVal);
\end{listing-nonumber}
The impression we get at first glance is that the hash
of an 80-char message containing
the string ``People of Earth, your attention, please''
can be computed as simply as declaring a variable of
type \lstinline|Skein_256_Ctxt_t|, letting \lstinline|Skein_256_Init|
initialize it, passing \lstinline|Skein_256_Update| a representation
of the string, and calling \lstinline|Skein_256_Final| with the
address of a buffer where to write the hash value.
Let us write a C program that does just that:
\listinginputcaption{1}{\texttt{main\_1.c}}{tutorial/main_1.c}
In order to make the test useful, we have to print
the obtained hash value. Because the result we are interested in is
an 8-byte number, represented as a char array of
arbitrary characters (some of them non-printable), we cannot
use string-printing functions, hence the \lstinline|for| loop at lines 17-18.
\goodbreak
With luck, the compilation goes smoothly and we obtain
the hash value:
\begin{shell}
gcc *.c
./a.out
\end{shell}
\begin{logs}
215
215
189
207
196
124
124
13
\end{logs}
\subsection{First try at an analysis using \Eva{}}
Let us now see how \Eva{} works on the same example.
\Eva{} can be launched with the following command.
The analysis should not take more than a few seconds:
\begin{listing-nonumber}
frama-c -eva *.c >log
\end{listing-nonumber}
Frama-C has its own model of the target platform (the default target is
a little-endian 32-bit platform). It also uses the host system's preprocessor.
If you want to do the analysis for a different platform than the host platform,
you need to provide Frama-C with a way to pre-process the files as they
would be during an actual compilation.
There are analyses where the influence of host platform parameters
is not noticeable. The analysis we are embarking on is not one of them.
If you pre-process the Skein source code with a 64-bit compiler and
then analyze it with Frama-C targeting its default 32-bit platform,
the analysis will be meaningless and you won't be able to make sense of
this tutorial.
If you are using a 64-bit compiler, simply match
Frama-C's target platform with your host header files by systematically adding
the option \verb|-machdep x86_64| to all commands in
this tutorial.
The ``\lstinline|>log|'' part of the command sends all
the messages emitted by Frama-C into a file named
``\lstinline|log|''. \Eva{} is verbose for a number of reasons that
will become clearer later in this tutorial.
The best way to make sense of the information
produced by the analysis is
to send it to a log file.
There is also a Graphical User Interface
for creating analysis projects and
visualizing the results.
The GUI is still evolving quickly at this stage
and it would not make much sense to
describe its manipulation in detail in this document.
You are however encouraged to try it if it is available on your
platform.
One way to create an analysis project in the GUI
is to pass the command \lstinline|frama-c-gui| the same options
that would be passed to \lstinline|frama-c|. In this first example,
the command \lstinline|frama-c-gui -eva *.c| launches
the same analysis and then opens the GUI for inspection of the results.
The initial syntactic analysis and symbolic link phases
of Frama-C may find issues such as
inconsistent types for the same symbol in different files.
Because of the way separate compilation is implemented,
the issues are not detected by standard C compilers.
It is a good idea to check for these issues in the first
lines of the log.
\subsection{Missing functions}
Since we are trying out the library for the first time,
something else to look for in the log file is the list of functions for
which the source code is missing. \Eva{} requires
either a body or a specification for each function it analyses.
\begin{listing-nonumber}
grep "Neither code nor specification" log
\end{listing-nonumber}
This should match lines indicating functions that are both undefined
(without source) and that have no specification in Frama-C's standard library.
In this example, no functions will be output, since Frama-C's standard library
already has prototypes for the most important library functions.
But suppose you would have a custom printing
function, say \verb|print| instead of \verb|printf|, or some other function
with missing source code. For instance, change the code in \verb|main_1.c|,
replacing \verb|printf| with \verb|print|, re-run \Eva{},
and then grep the log again. You will obtain this output:
\begin{logs}
main_1.c:14:[kernel] warning: Neither code nor specification for function print,
generating default assigns from the prototype
\end{logs}
The warning indicates that \Eva{} is trying to infer an
{\em assigns} clause for the function. This will be explained in detail later
(\ref{annot_assigns}), but for now the important part is that \Eva{}
needs either source code or an ACSL specification in order to work.
Without those, it can only guess what each function does from the function's
prototype, which is both inaccurate and likely incorrect.
Once we are convinced that all functions have either a body or a specification,
we can focus on functions without bodies:
\begin{listing-nonumber}
grep "using specification for function" log
[eva] using specification for function memset
[eva] using specification for function memcpy
[eva] using specification for function printf
\end{listing-nonumber}
We find three functions. Not having a body for \verb|printf| is fine,
since calls to this function have no observable effects for the analysis.
Things are not so clear-cut for
\verb|memcpy| and \verb|memset|, for which we will provide bodies later.
It is also possible to obtain a list of
missing function definitions by using the command:
\begin{listing-nonumber}
frama-c -metrics -metrics-libc *.c
\end{listing-nonumber}
This command computes, among other pieces of information,
a list of missing functions using
a syntactic analysis. It is not exactly equivalent to
grepping the log of \Eva{} because it lists
all the functions that are missing, while the log of
\Eva{} only cites the functions that would have
been necessary to an analysis. When
analyzing a small part of a large codebase, the latter list
may be much shorter than the former.
In this case, relying on the information displayed by \lstinline|-metrics|
means spending time hunting for functions that are not actually
necessary.
The output of \verb|frama-c -metrics -metrics-libc *.c|
will then include the following lines:
\begin{logs}
Undefined functions (83)
=======================
[...]
gets (0 call); memchr (0 call); memcmp (0 call); memcpy (21 calls);
memmove (0 call); memset (27 calls); perror (0 call); printf (1 call)
\end{logs}
All functions included from \verb+string.h+ are undefined, but only
\verb+memcpy+, \verb+memset+ and \verb+printf+ are called.
To handle the absence of functions \lstinline|memset| and \lstinline|memcpy|%
\footnote{Since \FramaC 18 (Argon), \Eva{} includes builtin functions for
\lstinline|memset| and \lstinline|memcpy|,
thus providing source code for them is no longer necessary.},
we will provide source code for them.
Sometimes it is a good idea to provide the
exact source code that will actually be used in an actual
running environment, so as to detect subtle bugs in the interactions
between the program and the system libraries.
On the other hand, string manipulation functions often
feature difficult-to-analyze
architecture-dependent optimizations.
For these, it is better to provide a simpler but equivalent implementation.
In this particular case, let us provide these two
functions in a file that we will place with the others and
name ``\lstinline|lib.c|''.
\listinginput{1}{tutorial/lib.c}
%% Some compilation platforms' headers define the names \lstinline|memset|
%% and \lstinline|memcpy| in ways that make it impossible to provide
%% your own implementation for these functions. If this happens for
%% yours, you can try placing your own header in the analysis directory,
%% under the name ``\lstinline|string.h|''
%% \listinginput{1}{tutorial/string.h}
%% In some circumstances, the results of \Eva{} may
%% remain useful even when letting Frama-C guess the effects
%% of missing functions. In the case of functions that take
%% pointers as arguments, however, the possibilities are too numerous
%% for the analyzer to be able to guess right.
%% In this example, the function \lstinline|printf| is safe to
%% leave without an implementation, because
%% unlike the other two,
%% it does not have effects on the variables of the program.
%. It is generally far simpler to provide
%either the actual code or a model written in C
%for the missing functions.
For some functions -- including those two -- an implementation is
already provided in \FramaC. It can be found in the \lstinline|.c|
files of the directory \lstinline|share/frama-c/libc|, usually within
a file identically named to the header declaring the functions. For
\lstinline|memcpy| and \lstinline|memset|, the implementation can thus
be found in \lstinline|string.c|, and adding it to the list of files
to be analyzed would make the warning disappear.
%
However, we are going to assume that such implementations are not available
in the remainder of this tutorial. Since quite a few functions do not have
any implementation in \FramaC's \lstinline|libc|, this allows us to show how
to deal with the general case.
\subsection{First meaningful analysis}
If we run the analysis again with definitions for \lstinline|memset|
and \lstinline|memcpy|, the newly obtained log no longer mentions
any missing function other than \lstinline|printf|\footnote{It is
a good idea to
check again for missing functions because new execution paths
could have been activated by the implementations of functions
that were previously missing.}.
The log next contains a description of the initial values of
variables. Apart from the variables that we defined ourselves, there
is a rather strange one, named \lstinline|ONE| and indeed containing
\lstinline|1|. Searching the source code reveals that this variable is
in fact declared \lstinline|static| as it should be. Frama-C can
display the value of static variables (something that is annoying to
do when using a C compiler for testing). Frama-C may rename
static variables in order to distinguish them from another variable
with the same name. The GUI displays the original source code
alongside the transformed one.
A quick inspection shows that the Skein functions
use variable \lstinline|ONE| to detect endianness.
Frama-C assumes a little-endian architecture
by default, so \Eva{} is only analyzing the
little-endian version of the library (Frama-C
also assumes an IA-32 architecture, so we are only
analyzing the library as compiled and run on this architecture).
The big-endian version of the library could be analyzed
by reproducing the same steps we are taking here for
a big-endian configuration of Frama-C.
You will also see lots of variable names
prefixed by \verb|__fc_|, \verb|__FC_| and \verb|S___fc_|.
These are all variables coming from ACSL specifications in the Frama-C
standard library.
The next line in the log is a progression message, that indicates
that \Eva{} has encountered a loop and is performing an approximations.
Those messages can be ignored for now.
\begin{logs}
lib.c:17:[eva] entering loop for the first time
\end{logs}
The next log entry which is not a progression message is a
warning concerning variable \lstinline|tmp|
\begin{logs}
lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp);
(tmp from ptr++)
\end{logs}
There is no variable \lstinline|tmp| in our file \lstinline|lib.c|,
but as indicated in the message, this variable is introduced by the source
code normalization of the \verb|ptr++| expression.
You can also see the relation between this variable
and the original source code in the GUI.
Alternately, the command \lstinline|frama-c *.c -print|
shows how the original source code has been
transformed by Frama-C (note, however, that the location
\lstinline|lib.c:18| refers to the original source file).
Here, it is the statement \lstinline|*ptr++ = val;| in the
function \lstinline|memset| that has been transformed by
Frama-C into the sequence below, so that it would not
be ambiguous which value of \lstinline|ptr|
has to be a valid pointer.
\begin{listing-nonumber}
tmp = ptr;
ptr ++;
*tmp = (unsigned char)val;
\end{listing-nonumber}
The command \lstinline|frama-c *.c -eva -print| launches
\Eva{} and then prints the transformed source code,
annotated with the
alarms raised by \Eva{}. In our case, the
function \lstinline|memset| is transformed in
\begin{listing-nonumber}
void *memset(void *dest, int val, size_t len)
{
unsigned char *ptr;
ptr = (unsigned char *)dest;
while (1) {
size_t tmp_0;
unsigned char *tmp;
{ /* sequence */
tmp_0 = len;
len --;
;
}
if (! (tmp_0 > (size_t)0)) break;
{ /* sequence */
tmp = ptr;
ptr ++;
/*@ assert Value: mem_access: \valid(tmp); */
*tmp = (unsigned char)val;
}
}
return dest;
}
\end{listing-nonumber}
As we will find out later in this tutorial, this alarm is a false
alarm, an indication of a potential problem in a place where there is
in fact none. On the other hand, \Eva{} never remains
silent when a risk of run-time error is present, unlike many other
static analysis tools. The important consequence is that if you get
the tool to remain silent, you have {\em formally verified} that
no run-time error could happen when running the program.
It may seem that the access to \lstinline|*tmp| is the
only dangerous operation in the function \lstinline|memset|,
and therefore that the analyzer is not doing a very good job
of pointing only the operations that are problematic.
While it is true that the results obtained on this example
are not very precise, the comparison
\lstinline|>| is also considered dangerous
by the analyzer. This operation may be unspecified when applied
to an invalid pointer. Consider a program that does the
following:
\begin{listing}{1}
int main()
{
int a;
return ((unsigned int)(&a - 1432)) > 0;
}
\end{listing}
The programmer might misguidedly have written the \lstinline|>0|
test in this example as a way to test for \lstinline|NULL|.
Ey\footnote{This manual uses Spivak pronouns:
\url{http://en.wikipedia.org/wiki/Spivak_pronoun}} might
expect this program always to return 1, since
\lstinline|(&a - 1432)| is not \lstinline|NULL|. But this
program may in fact return 0 if, out of bad luck,
the compiler places variable \lstinline|a| precisely at address
1432. This kind of bug would be very
difficult to find and/or reproduce by testing.
The analyzer does not emit any alarm for the comparison
\lstinline|tmp_0 > (size_t )0| in the normalized source code
for \lstinline|memset|, and
this means that it guarantees\footnote{The fine print is
the reference part of this manual} that
this particular comparison is never non-deterministic
because of an issue such as the one just described.
This conclusion can only be reached by looking at
the arguments actually passed to the function \lstinline|memset|.
Therefore, this conclusion is only valid for all the
possible executions coming from the \lstinline|main|
function that we provided, and not for all the possible
calls to function \lstinline|memset| that a programmer
could write.
One should not spend too much time at this stage trying to
determine if the dozens or so alarms emitted by the analyzer
are true alarms that indicate an actual problem or false alarms
that don't.
The most important information is that the analysis did not
take an unreasonable time.
% The second most important information is that the analysis
% seems to have explored all the paths we intended for it to
% explore, as seen in the list of functions for which values are
% printed or in the log entries such as:
% \begin{logs}
% [value] computing for function RotL_64 <- Skein_256_Process_Block
% <- Skein_256_Final <- main.
% Called from skein_block.c:100.
% \end{logs}
The GUI allows to inspect the sets of values
obtained during the analysis and to get
a feeling of how it works. Right-clicking on the function name
in a function call brings a contextual menu that allows to go
to the function's definition and to inspect it in turn.
In order to return to the caller,
right-clicking on the name of the current function at the top
of the normalized code buffer brings a contextual menu
with a list of callers.
You can also use the Back button (left arrow) in the toolbar.
\section{Searching for bugs}
\subsection{Increasing the precision with option {\tt -eva-slevel}}
Because we compiled and executed the same program that we are
now analyzing, we are confident that most of the alarms
displayed by \Eva{} are false alarms that do not
correspond to actual problems. In fact, because the program
is deterministic and takes no inputs,
only one of the alarms can be a true alarm in
the strictest sense. The analysis stops when an error is
encountered (what would it even mean to continue the analysis
after, for instance, \lstinline|NULL| has been dereferenced?).
It is only because \Eva{} is uncertain about the errors
encountered here that it continues and finds more possible
errors.
Before we spend any of our time looking at each of these alarms,
trying to determine whether it is true or false,
it is a good idea to make the analyzer spend more of its time
trying to determine whether each alarm is true or false.
There are different settings that influence the compromise
between precision and resource consumption.
If you chose to remember only one of these settings, it would
have to be the option \lstinline|-eva-slevel|. This option has
two visible effects:
it makes the analyzer unroll loops, and it makes it propagate
separately the states that come from the \lstinline|then|
and \lstinline|else| branches of a conditional statement.
This makes the analysis more precise (at the cost of being
slower) for almost every program that can be analyzed.
\Eva{} has different ways to provide information on
the loss of precision that causes the false alarms.
This is another reason why it is so verbose during an analysis:
it tries to provide enough information for a motivated user
to be able to understand what is happening during the analysis,
where the imprecise values that cause the false alarms came from,
and what instructions in the program made them imprecise.
But instead of spending precious human time making use
of the provided information, the brute-force approach of blindly
applying the \lstinline|-eva-slevel| option must be tried first.
Indeed, \lstinline|-eva-slevel| is one of the options of \Eva{}
that can never cause it to become incorrect. Such options,
when used wrongly, may make the analysis slower,
provide hints that are not
useful, or even possibly counter-productive (making the analysis
less precise), but they cannot
prevent the analyzer from reporting a problem where there is one.
The \lstinline|-eva-slevel| option is useful for unrolling loops. We do
not know without further inspection how many iterations the loops
inside the program need, but the input message is 80 characters long,
and we can assume that it is read inside a loop, so using the option
\lstinline|-eva-slevel 100| should have a good chance of unrolling at
least that loop. Furthermore, it does not make the analysis slower to
use a number greater than the value actually necessary to unroll
every loop and conditional in the program. We are limiting ourselves
to 100 in this first try because we do not know how much time the
analysis will take with this number of unrolled branches. If, with
the parameter \lstinline|-eva-slevel 100|, the precision of the analysis still isn't
satisfactory and the time spent by the analyzer remains reasonable, we
can always try a higher value. This should be progressive, because a value
higher than a few hundreds, when there really are that many branches to
unroll in the program, can make the analysis very slow.
\begin{shell}
frama-c -eva-slevel 100 -eva *.c >log
\end{shell}
The analysis goes rather far without finding any alarm,
but when it is almost done (after the analysis of function
\lstinline|Skein_256_Final|), it produces:
\begin{logs}
...
main_1.c:14:[kernel] warning: Neither code nor specification for function printf,
generating default assigns from the prototype
[eva] using specification for function printf
[eva] Done for function printf
main_1.c:14:[eva] completely invalid value in evaluation of
argument (int)hash[i]
main_1.c:14:[kernel] warning: accessing uninitialized left-value.
assert \initialized(&hash[i]);
[eva] Recording results for main
[eva] done for function main
...
\end{logs}
These messages mean that lvalue \lstinline|hash[i]|, as found
in line 14 of our file \lstinline|main_1.c|, may be
uninitialized\footnote{An lvalue is a C expression that exists in memory.
Examples of lvalues are a simple variable {\tt x}, the cell of an
array {\tt t[i]}, or a pointed location {\tt *p}. Because an lvalue
lies in memory, it may have uninitialized contents.
Using an
lvalue when it has not yet been initialized is a forbidden
behavior that compilers do not detect in all cases, and this
is only one of the numerous pitfalls of C.}.
Actually, the analyzer finds that for one execution
branch, it is certain to be uninitialized. This is when reading
the memory location ``\lstinline|hash[i]|'', one of the arguments given
to the \lstinline|printf| function.
Extra information can be observed using the GUI, by clicking on the
\lstinline|hash| expression in line 14 and reading the \textsf{Values} tab,
which displays the following values for \lstinline|hash|:
\begin{logs}
[0] $\in$ {200}
[1..7] $\in$ UNINITIALIZED
\end{logs}
This means that element \lstinline|hash[0]| has value 200, but elements 1 to 7
are definitely uninitialized. Therefore the first iteration of the loop
proceeds as expected, but on the second iteration, the analysis of the branch
is stopped, because
from the point of view of the analyzer reading an uninitialized
value should be a fatal error. And there are no other execution
branches that reach the end of this loop, which is why the analyzer
finds that function \lstinline|main| does not terminate.
The next warning, \lstinline|accessing uninitialized left-value...|
is emitted whenever an accessed value may be uninitialized, and the emitted
assertion, \lstinline|\initialized(&hash[i])|, could not be verified by
\Eva{}, and must therefore be verified by other means
(eventually, other Frama-C analyses) to ensure the behavior of the code is
defined.
\subsection{Making sense of alarms}
There were three alarms in the analysis. The other two are related to
interactions between
the Frama-C standard library and our own implementations of \verb|memset|/
\verb|memcpy|\footnote{Namely, the fact that their specifications involve
{\em logic predicates} that are not yet understood by \Eva{} leads
to an ``Unknown'' status, even though they are valid in this case.},
but we can safely ignore them here.
%% There are three alarms in the analysis we now have, and two of them are related
%% to the (lack of) \lstinline|printf| specification, because we did not include
%% that of the Frama-C standard library. Because \lstinline|printf| does not
%% modify the values of any variables we are interested in, and we do not want to
%% provide it with some code, we can simply add a minimalistic ACSL specification
%% to the \lstinline|printf| prototype in line 3:
%% \begin{listing}{3}
%% /*@ assigns \result \from \nothing; */
%% int printf(const char*,...);
%% \end{listing}
%% This is incorrect in the general case (since it claims that the result of
%% \lstinline|printf| is a constant value, independent of its arguments),
%% but since we ignore its result in our code anyway, this is fine here.
%% This is by the way equivalent to the default behavior that was previously
%% inferred by \Eva{}, and the reason why it emitted warnings:
%% it is not a correct specification in the general case.
Furthermore, the program
is also found not to terminate. This means that every execution
either encounters the problem from line 14 in \lstinline|main_1.c|,
or an infinite loop somewhere else in the program.
We should, therefore, pay particular attention to this alarm.
Looking again at the header file \lstinline|skein.h|, we may now
notice that in function \lstinline|Skein_256_Init|,
the formal parameter for the desired hash length is named
\lstinline|hashBitLen|. This parameter should certainly be expressed in bits!
We were inadvertently asking for a 1-char hash of the message
since the beginning, and the test that we ran as our first
step failed to notice it.
Our first imprecise analysis did find it -- the same alarm that
pointed out the problem is present among the other alarms produced
by the first analysis. Note that because of the imprecisions,
the first analysis was not able to conclude that the uninitialized access
at \lstinline|hash[2]| was certain, making it much less noticeable
among the others.
The bug can be fixed by passing
8*HASHLEN instead of HASHLEN as the second argument of
\lstinline|Skein_256_Init|.
With this fix in place, the analysis
with \lstinline|-eva-slevel 100| produces no alarms (apart from the postconditions
of \verb+memcpy+ and \verb+memset+) and gives the following result:
\begin{logs}
Values for function main:
hash[0] $\in$ {40}
[1] $\in$ {234}
[2] $\in$ {138}
[3] $\in$ {230}
[4] $\in$ {134}
[5] $\in$ {120}
[6] $\in$ {37}
[7] $\in$ {35}
i $\in$ {8}
\end{logs}
Meanwhile, compiling and executing the fixed test produces the result:
\begin{logs}
40
234
138
230
134
120
37
35
\end{logs}
\section{Guaranteeing the absence of bugs}
\subsection{Generalizing the analysis to arbitrary messages of fixed length}
The analysis we have done so far is very satisfying because
it finds problems that are not detected by a C compiler or by testing.
The results of this analysis only prove
the absence of run-time errors\footnote{and the absence
of conditions that
{\em should} be run-time errors -- like the uninitialized access already
encountered} when the particular message that we
chose is being hashed, though.
It would be much more useful to have the assurance that there are
no run-time errors for any input message, especially since the library
might be under consideration for embedding in a device where anyone
(possibly a malicious user) will be able to choose the message
to hash.
A first generalization of the previous analysis is to include
in the subject matter
the hashing of all possible 80-character messages. We can do this
by separating the analyzed program in two distinct phases, the first
one being the construction of a generalized analysis context
and the second one being made of the sequence of function calls
that we wish to study:
\listinginputcaption{1}{\texttt{main\_2.c}}{tutorial/main_2.c}
From this point onward the program is no longer executable because
of the call to builtin primitives such as \lstinline|Frama_C_interval|.
We therefore dispense with the final calls to \lstinline|printf|,
since \Eva{} offers simpler ways to observe intermediate
and final results. Note that we included \verb|__fc_builtin.h|, a file
that comes from the Frama-C distribution and which
defines \lstinline|Frama_C_interval|. Running Frama-C on this
file (without \verb|main_1.c| in the same directory, to avoid having two
definitions for \verb|main|):
\begin{shell}
frama-c -eva-slevel 100 -eva *.c >log 2>&1
\end{shell}
This time, the absence of actual alarms is starting to be really interesting:
it means that it is formally excluded that the functions
\lstinline|Skein_256_Init|, \lstinline|Skein_256_Update|, and
\lstinline|Skein_256_Final| produce a run-time error when
they are used, in this order, to initialize a local variable
of type \lstinline|Skein_256_Ctxt_t| (with argument 64 for the size),
parse an arbitrary message and produce a hash in a local \lstinline|u08b_t|
array of size~8.
\subsection{Verifying functional dependencies}
If we had written a formal specification for function Skein, we would soon
have expressed that we expected it to modify the buffer \lstinline|hash|
that is passed to \lstinline|Skein_256_Final| (all 8 bytes of the buffer),
to compute the new contents of this buffer from the contents of the
input buffer \lstinline|msg| (all 80 bytes of it), and from nothing else.
During \Eva{}'s analysis, we have seen that all of the buffer \verb|hash|
was always modified in the conditions of the analysis: the reason is that
this buffer was uninitialized before the sequence of calls,
and guaranteed to be initialized after them.
We can get the complete list of locations that may be modified by each
function by adding the option \verb|-out| to the other options we were
already using. These locations are computed in a quick analysis that comes
after \Eva{}. In the results of this analysis, we find:
\begin{logs}