-
Andre Maroneze authoredAndre Maroneze authored
layout: post
author: André Maroneze
date: 2018-11-15 12:00 +0200
categories: Eva benchmarks
image:
title: "Frama-C/Eva in SATE VI with Juliet 1.3"
Frama-C chose to participate in the SATE VI Ockham Sound Analysis Criteria. With the upcoming Frama-C 18 (Argon) release, results can be reproduced using the open source version of the Eva plug-in. This post briefly presents SATE, the Juliet test suite, and the results obtained with Frama-C 18. We then describe the Frama-C SATE VI repository on Github, created to allow anyone to reproduce the results obtained with Frama-C/Eva. We also explain some patches that were applied to the test suites, and our choice of parametrization options for Eva.
SATE VI and Juliet 1.3
SATE (Static Analysis Tool Exposition), organized by NIST, is a comparative study of static analysis tools: toolmakers volunteer to participate by running their tools on the sets provided by NIST, and then by submitting their analysis scripts and results. The NIST SAMATE team then compiles a dense report, comparing and evaluating all participating tools.
For the analysis of C programs, there are two tracks: one in which all kinds of tools participate, called the Classic track, and another dedicated to sound analyzers, in the sense of the Ockham Sound Analysis Criteria.
For the Ockham track in SATE VI, we chose the Juliet 1.3 test suite for C/C++, which contains over 64,000 test cases. Some are written in C++, but most are written in C. Test suites are organized in directories related to CWEs (Common Weakness Enumerations): each directory has code samples containing violations of the CWE. For each test case, there are two variants: one with good code (which does not trigger the CWE), and another with bad code (which triggers the CWE in question). In our tests using Juliet, Frama-C is supposed to find all occurrences of bad code; it may, however, report some good code as bad.
CWEs handled by Frama-C/Eva
For the analysis with Frama-C/Eva, we chose the subset of CWEs in Juliet 1.3 for C/C++ which are handled by the analyzer. Click the arrow below to unfold the list.
CWEs in Juliet 1.3 handled by Frama-C/Eva (18 - Argon)
- CWE121 - Stack Based Buffer Overflow
- CWE122 - Heap Based Buffer Overflow
- CWE123 - Write What Where Condition
- CWE124 - Buffer Underwrite
- CWE126 - Buffer Overread
- CWE127 - Buffer Underread
- CWE190 - Integer Overflow
- CWE191 - Integer Underflow
- CWE194 - Unexpected Sign Extension
- CWE195 - Signed to Unsigned Conversion Error
- CWE196 - Unsigned to Signed Conversion Error
- CWE197 - Numeric Truncation Error
- CWE369 - Divide by Zero
- CWE415 - Double Free
- CWE416 - Use After Free
- CWE457 - Use of Uninitialized Variable
- CWE464 - Addition of Data Structure Sentinel
- CWE469 - Use of Pointer Subtraction to Determine Size
- CWE475 - Undefined Behavior for Input to API
- CWE562 - Return of Stack Variable Address
- CWE587 - Assignment of Fixed Address to Pointer
- CWE588 - Attempt to Access Child of Non Structure Pointer
- CWE590 - Free Memory Not on Heap
- CWE617 - Reachable Assertion
- CWE665 - Improper Initialization
- CWE690 - NULL Deref From Return
- CWE758 - Undefined Behavior
- CWE835 - Infinite Loop
- CWE843 - Type Confusion
Note that some CWEs are very broad, such as CWE758 - Undefined Behavior. We do not claim Frama-C/Eva handles all cases which might be categorized under such CWEs; however, those considered in the Juliet 1.3 test suite have all their bad code samples correctly labeled.
Some examples of CWEs which are not considered by Frama-C/Eva include CWE426 - Untrusted Search Path, CWE90 - LDAP Injection and CWE780 - Use of RSA Algorithm Without OAEP. All three of them constitute functional-level weaknesses which do not cause runtime errors.
Some of the CWEs not currently considered by Frama-C/Eva could be handled via specific function annotations, but this would constitute a very ad hoc solution. Other plugins in the Frama-C platform might be better suited for these CWEs; e.g., using the WP plugin with function annotations, or the Secure Flow plugin with trusted and untrusted sinks and sources, might allow identifying the cases with API violations leading to weaknesses.
Results
The following table summarizes the results obtained with Frama-C 18 (Argon). The script that produces this table is presented in a later section.
total imprecise : 751
total non-terminating: 3
total ok : 44668
total timeout : 0
total unsound : 0
precision (excluding non-terminating): 0.983465
Each test case is labeled as one of the following cases:
-
ok
: correctly labeled: either a good test without alarms, or a bad test with alarms; -
imprecise
: a good test with a false alarm; -
unsound
: a bad test without an alarm; -
non-terminating
: the program does not terminate; this should never happen, except in the case of CWE835 - Infinite Loop; -
timeout
: a test which did not finish within the allotted time (by default, 30 seconds). This never happens in our benchmarking rig, but slower or overloaded machines may see this happen. If so, then a larger timeout should be given and the test should be re-run.
The precision (98.3%) displayed in the last line is the number of ok
tests
divided by the total number of tests (ok
+ imprecise
+ unsound
).
The improvements between Frama-C 17 (Chlorine) and 18 (Argon) are noteworthy,
especially the fact that the memcpy
and memset
builtins are now released in
the open source version. Other improvements in precision also contributed to
remove some spurious warnings and to simplify the analysis script.
The remaining imprecisions are due to different situations, but most of them
occur several times, due to reuse of the same patterns among different test
cases. One source of imprecision is the snprintf
function, whose
specification is not precise enough to ensure that the output string is
null-terminated. This leads to false alarms when that string is used later in
the code. Another case is related to CWE369 - Divide by Zero, in which some
tests use socket-related functions whose specifications do not allow for
precise results in every case.
The Frama-C SATE VI Repository
The Frama-C SATE VI repository on Github allows anyone to run Frama-C on the Juliet 1.3 test suite for C/C++, to reproduce the results or to try different approaches, e.g. parametrizing the analysis differently, or trying a new abstract domain.
Software requirements are: Frama-C 18.0 (Argon), GNU Make (>= 4.0), the GNU
parallel
tool, Bash (>= 4.0) and wget
(to download the Juliet archive from
NIST's website).
The repository is structured in the following way:
- A
GNUmakefile
allows bootstrapping the repository, by downloading the Juliet 1.3 for C/C++ archive from NIST's website and then decompressing it (Juliet 1.3 is not distributed in our repository). It also applies some patches to Juliet's sources, described in a later section; - the
fc
directory contains Frama-C scripts used to run the analysis and automatically evaluate its results. The main scripts are:-
analyze.sh
: sets the default analysis parameters; these differ from Frama-C's defaults in several ways, being stricter in some cases but less strict in others; -
evaluate_case.sh
: performs automatic evaluation of good and bad cases into the categories presented before;
-
- the
fc/patches
directory contains some patches applied to Juliet 1.3 to fix known issues in a few tests. It also changes the way non-deterministic random numbers are used in some tests (details are presented later); - the
C
directory contains the extracted Juliet 1.3 archive, with its scripts and test cases; - finally, the
oracles
directory contains a versioned copy of the oracles produced by the analysis with Frama-C/Eva; this allows comparing different parametrizations, and following the evolution of the platform.
The overall process to use the repository is the following:
git clone https://github.com/Frama-C/SATE-VI.git
cd SATE-VI
make
cd C/testcases
make
The last make
step takes a very long time
(2+ hours, depending on your CPU and number of cores).
Note that, if Frama-C is not installed in the PATH, you can use the
fc/frama-c-path.sh
script to specify the directory where Frama-C is
installed. This is especially useful to compare different versions of Frama-C.
After running make
inside the C/testcases
directory, a small summary with
the number of tests performed in each subdirectory will be presented in
textual form:
CWE121_Stack_Based_Buffer_Overflow/s01: 876 tests
876 ok
CWE121_Stack_Based_Buffer_Overflow/s02: 1020 tests
1020 ok
CWE121_Stack_Based_Buffer_Overflow/s03: 1020 tests
34 imprecise
986 ok
...
Running ./summarize_results.sh
will also output this result.
The stat_results.py
script will produce an overall summary across
all subdirectories, as presented in the Results section.
Re-running individual tests
Inside each directory in C/testcases/
, it is possible to run a single test,
by removing its extension and adding _good.res
or _bad.res
, e.g. if there
is a file file.c
inside directory CWE000/s01
, then doing cd CWE000/s01
and make file_good.res
will run the test and produce three files:
-
file_good
: output of the analysis; -
file_good.csv
: CSV report of warnings produced during the analysis; -
file_good.res
: result ofevaluate_case.sh
applied to the test case report;
Note that for tests consisting of multiple files (each ending with a
, b
,
c
, etc.), the test case name is their prefix without the letter, e.g.
for the test comprised of files file_42a.c
and file_42b.c
, the command is
make file_42_good.res
(or bad
).
Finally, by replacing .res
with .gui
in the make command, it is possible to
open the Frama-C GUI on the test case, to inspect it.
In the Frama-C SATE VI repository, there are two sets of patches applied to Juliet 1.3 for C/C++:
fix-tests.patch
These are fixes in a small number of tests, due to 2 patterns which were found out to lead to undefined behavior.
The first fix consists in replacing an overflow check which accidentally introduces a signed overflow:
/* FIX: Add a check to prevent an overflow from occurring */
-if (abs((long)data) <= (long)sqrt((double)CHAR_MAX))
+if (data <= LONG_MAX && labs(data) <= (long)sqrt((double)CHAR_MAX))
{
data
has type unsigned int
. In the original code, the call to
abs((long)data)
is equivalent to abs((int)(long)data)
, since abs
takes an int
as argument. If int
and long
have the same length
(e.g. as in x86 32-bit), then data
can be larger than LONG_MAX
(and also INT_MAX
, since they have the same length), leading to a signed
overflow in the call, and thus an undefined behavior.
The patched version first checks that data
is small enough, and then
uses labs
to avoid downcasting data
.
The second fix concerns a set of files which had a local variable declared in a scope smaller than intended:
static void goodG2B()
{
void * data;
+ int intBuffer = 8;
/* Initialize data */
data = NULL;
{
/* FIX: Point data to an int */
- int intBuffer = 8;
data = &intBuffer;
}
CWE843_Type_Confusion__short_68_goodG2BData = data;
In this set of test cases, intBuffer
(and similar variables) was declared
inside a block, and the value was later used outside of the scope of the
variable, leading to undefined behavior. The fix consists in declaring the
variable in the main scope of the function.
improve-testcasesupport.patch
This concerns mainly the RAND32
macro,
used in Juliet to produce non-deterministic numbers. This macro is used in
some test cases to assign values to variables of different types. The result
is downcast to char
or short
whenever necessary. However, this downcast
leads to warnings due to usage of option -warn-signed-downcast
.
This option is enabled in the analyze.sh
script since some CWEs are
related to it.