Skip to content
Snippets Groups Projects
2018-11-15-Frama-C-Eva-in-SATE-VI-with-Juliet-1.3.md 21.80 KiB
layout: post
author: André Maroneze
date: 2018-11-15 12:00 +0200
categories: Eva benchmarks
image:
title: "Frama-C/Eva in SATE VI with Juliet 1.3"

Frama-C chose to participate in the SATE VI Ockham Sound Analysis Criteria. With the upcoming Frama-C 18 (Argon) release, results can be reproduced using the open source version of the Eva plug-in. This post briefly presents SATE, the Juliet test suite, and the results obtained with Frama-C 18. We then describe the Frama-C SATE VI repository on Github, created to allow anyone to reproduce the results obtained with Frama-C/Eva. We also explain some patches that were applied to the test suites, and our choice of parametrization options for Eva.

SATE VI and Juliet 1.3

SATE (Static Analysis Tool Exposition), organized by NIST, is a comparative study of static analysis tools: toolmakers volunteer to participate by running their tools on the sets provided by NIST, and then by submitting their analysis scripts and results. The NIST SAMATE team then compiles a dense report, comparing and evaluating all participating tools.

For the analysis of C programs, there are two tracks: one in which all kinds of tools participate, called the Classic track, and another dedicated to sound analyzers, in the sense of the Ockham Sound Analysis Criteria.

For the Ockham track in SATE VI, we chose the Juliet 1.3 test suite for C/C++, which contains over 64,000 test cases. Some are written in C++, but most are written in C. Test suites are organized in directories related to CWEs (Common Weakness Enumerations): each directory has code samples containing violations of the CWE. For each test case, there are two variants: one with good code (which does not trigger the CWE), and another with bad code (which triggers the CWE in question). In our tests using Juliet, Frama-C is supposed to find all occurrences of bad code; it may, however, report some good code as bad.

CWEs handled by Frama-C/Eva

For the analysis with Frama-C/Eva, we chose the subset of CWEs in Juliet 1.3 for C/C++ which are handled by the analyzer. Click the arrow below to unfold the list.

CWEs in Juliet 1.3 handled by Frama-C/Eva (18 - Argon)
  • CWE121 - Stack Based Buffer Overflow
  • CWE122 - Heap Based Buffer Overflow
  • CWE123 - Write What Where Condition
  • CWE124 - Buffer Underwrite
  • CWE126 - Buffer Overread
  • CWE127 - Buffer Underread
  • CWE190 - Integer Overflow
  • CWE191 - Integer Underflow
  • CWE194 - Unexpected Sign Extension
  • CWE195 - Signed to Unsigned Conversion Error
  • CWE196 - Unsigned to Signed Conversion Error
  • CWE197 - Numeric Truncation Error
  • CWE369 - Divide by Zero
  • CWE415 - Double Free
  • CWE416 - Use After Free
  • CWE457 - Use of Uninitialized Variable
  • CWE464 - Addition of Data Structure Sentinel
  • CWE469 - Use of Pointer Subtraction to Determine Size
  • CWE475 - Undefined Behavior for Input to API
  • CWE562 - Return of Stack Variable Address
  • CWE587 - Assignment of Fixed Address to Pointer
  • CWE588 - Attempt to Access Child of Non Structure Pointer
  • CWE590 - Free Memory Not on Heap
  • CWE617 - Reachable Assertion
  • CWE665 - Improper Initialization
  • CWE690 - NULL Deref From Return
  • CWE758 - Undefined Behavior
  • CWE835 - Infinite Loop
  • CWE843 - Type Confusion

Note that some CWEs are very broad, such as CWE758 - Undefined Behavior. We do not claim Frama-C/Eva handles all cases which might be categorized under such CWEs; however, those considered in the Juliet 1.3 test suite have all their bad code samples correctly labeled.

Some examples of CWEs which are not considered by Frama-C/Eva include CWE426 - Untrusted Search Path, CWE90 - LDAP Injection and CWE780 - Use of RSA Algorithm Without OAEP. All three of them constitute functional-level weaknesses which do not cause runtime errors.

Some of the CWEs not currently considered by Frama-C/Eva could be handled via specific function annotations, but this would constitute a very ad hoc solution. Other plugins in the Frama-C platform might be better suited for these CWEs; e.g., using the WP plugin with function annotations, or the Secure Flow plugin with trusted and untrusted sinks and sources, might allow identifying the cases with API violations leading to weaknesses.

Results

The following table summarizes the results obtained with Frama-C 18 (Argon). The script that produces this table is presented in a later section.

total imprecise      :    751
total non-terminating:      3
total ok             :  44668
total timeout        :      0
total unsound        :      0
precision (excluding non-terminating): 0.983465

Each test case is labeled as one of the following cases:

  • ok: correctly labeled: either a good test without alarms, or a bad test with alarms;
  • imprecise: a good test with a false alarm;
  • unsound: a bad test without an alarm;
  • non-terminating: the program does not terminate; this should never happen, except in the case of CWE835 - Infinite Loop;
  • timeout: a test which did not finish within the allotted time (by default, 30 seconds). This never happens in our benchmarking rig, but slower or overloaded machines may see this happen. If so, then a larger timeout should be given and the test should be re-run.

The precision (98.3%) displayed in the last line is the number of ok tests divided by the total number of tests (ok + imprecise + unsound).

The improvements between Frama-C 17 (Chlorine) and 18 (Argon) are noteworthy, especially the fact that the memcpy and memset builtins are now released in the open source version. Other improvements in precision also contributed to remove some spurious warnings and to simplify the analysis script.

The remaining imprecisions are due to different situations, but most of them occur several times, due to reuse of the same patterns among different test cases. One source of imprecision is the snprintf function, whose specification is not precise enough to ensure that the output string is null-terminated. This leads to false alarms when that string is used later in the code. Another case is related to CWE369 - Divide by Zero, in which some tests use socket-related functions whose specifications do not allow for precise results in every case.

The Frama-C SATE VI Repository

The Frama-C SATE VI repository on Github allows anyone to run Frama-C on the Juliet 1.3 test suite for C/C++, to reproduce the results or to try different approaches, e.g. parametrizing the analysis differently, or trying a new abstract domain.

Software requirements are: Frama-C 18.0 (Argon), GNU Make (>= 4.0), the GNU parallel tool, Bash (>= 4.0) and wget (to download the Juliet archive from NIST's website).

The repository is structured in the following way:

  • A GNUmakefile allows bootstrapping the repository, by downloading the Juliet 1.3 for C/C++ archive from NIST's website and then decompressing it (Juliet 1.3 is not distributed in our repository). It also applies some patches to Juliet's sources, described in a later section;
  • the fc directory contains Frama-C scripts used to run the analysis and automatically evaluate its results. The main scripts are:
    • analyze.sh: sets the default analysis parameters; these differ from Frama-C's defaults in several ways, being stricter in some cases but less strict in others;
    • evaluate_case.sh: performs automatic evaluation of good and bad cases into the categories presented before;
  • the fc/patches directory contains some patches applied to Juliet 1.3 to fix known issues in a few tests. It also changes the way non-deterministic random numbers are used in some tests (details are presented later);
  • the C directory contains the extracted Juliet 1.3 archive, with its scripts and test cases;
  • finally, the oracles directory contains a versioned copy of the oracles produced by the analysis with Frama-C/Eva; this allows comparing different parametrizations, and following the evolution of the platform.

Reproducing the results

The overall process to use the repository is the following:

git clone https://github.com/Frama-C/SATE-VI.git
cd SATE-VI
make
cd C/testcases
make

The last make step takes a very long time (2+ hours, depending on your CPU and number of cores).

Note that, if Frama-C is not installed in the PATH, you can use the fc/frama-c-path.sh script to specify the directory where Frama-C is installed. This is especially useful to compare different versions of Frama-C.

After running make inside the C/testcases directory, a small summary with the number of tests performed in each subdirectory will be presented in textual form:

CWE121_Stack_Based_Buffer_Overflow/s01: 876 tests
    876  ok
CWE121_Stack_Based_Buffer_Overflow/s02: 1020 tests
   1020  ok
CWE121_Stack_Based_Buffer_Overflow/s03: 1020 tests
     34  imprecise
    986  ok
...

Running ./summarize_results.sh will also output this result. The stat_results.py script will produce an overall summary across all subdirectories, as presented in the Results section.

Re-running individual tests

Inside each directory in C/testcases/, it is possible to run a single test, by removing its extension and adding _good.res or _bad.res, e.g. if there is a file file.c inside directory CWE000/s01, then doing cd CWE000/s01 and make file_good.res will run the test and produce three files:

  • file_good: output of the analysis;
  • file_good.csv: CSV report of warnings produced during the analysis;
  • file_good.res: result of evaluate_case.sh applied to the test case report;

Note that for tests consisting of multiple files (each ending with a, b, c, etc.), the test case name is their prefix without the letter, e.g. for the test comprised of files file_42a.c and file_42b.c, the command is make file_42_good.res (or bad).

Finally, by replacing .res with .gui in the make command, it is possible to open the Frama-C GUI on the test case, to inspect it.

Patches applied to Juliet 1.3

In the Frama-C SATE VI repository, there are two sets of patches applied to Juliet 1.3 for C/C++:

fix-tests.patch

These are fixes in a small number of tests, due to 2 patterns which were found out to lead to undefined behavior.

The first fix consists in replacing an overflow check which accidentally introduces a signed overflow:

 /* FIX: Add a check to prevent an overflow from occurring */
-if (abs((long)data) <= (long)sqrt((double)CHAR_MAX))
+if (data <= LONG_MAX && labs(data) <= (long)sqrt((double)CHAR_MAX))
 {

data has type unsigned int. In the original code, the call to abs((long)data) is equivalent to abs((int)(long)data), since abs takes an int as argument. If int and long have the same length (e.g. as in x86 32-bit), then data can be larger than LONG_MAX (and also INT_MAX, since they have the same length), leading to a signed overflow in the call, and thus an undefined behavior. The patched version first checks that data is small enough, and then uses labs to avoid downcasting data.

The second fix concerns a set of files which had a local variable declared in a scope smaller than intended:

 static void goodG2B()
 {
     void * data;
+    int intBuffer = 8;
     /* Initialize data */
     data = NULL;
     {
     /* FIX: Point data to an int */
-        int intBuffer = 8;
         data = &intBuffer;
     }
     CWE843_Type_Confusion__short_68_goodG2BData = data;

In this set of test cases, intBuffer (and similar variables) was declared inside a block, and the value was later used outside of the scope of the variable, leading to undefined behavior. The fix consists in declaring the variable in the main scope of the function.

improve-testcasesupport.patch

This concerns mainly the RAND32 macro, used in Juliet to produce non-deterministic numbers. This macro is used in some test cases to assign values to variables of different types. The result is downcast to char or short whenever necessary. However, this downcast leads to warnings due to usage of option -warn-signed-downcast. This option is enabled in the analyze.sh script since some CWEs are related to it.