user-analysis-scripts.tex

%% --------------------------------------------------------------------------
%% --- Analysis scripts
%% --------------------------------------------------------------------------

\chapter{Analysis Scripts}
\label{user-analysis-scripts}

This chapter describes some tools and scripts shipped with \FramaC to help
users setup and run analyses on large code bases. These scripts can also help
dealing with unfamiliar code and automating analyses.

\section{Requirements}

These analysis scripts are chiefly based on the following tools:

\begin{description}
\item[Python]: most scripts are written using Python 3. Some scripts require
  features from specific Python versions, and perform version checks when
  starting.
\item[GNU Make]: the main workflow for analysis scripts consists in using
  GNU Make (4.0 or newer) to compute dependencies and cache intermediate
  results.
\item[Bash]: some scripts are written in Bash.
\end{description}

Besides those, a few tools are occasionally required by the scripts, such as
Perl and GNU Parallel.

\section{Usage}

Most scripts are accessible via the \texttt{frama-c-script} command, installed
along with \FramaC. Running this command without any arguments prints the list
of commands, with a brief description of each of them. Some extra scripts are
available by directly running them; in both cases, the actual scripts
themselves are installed in Frama-C's \texttt{share} directory, underneath
\texttt{analysis-scripts}.

\subsection{General Framework}

{\em Note}: the analysis scripts are intended for usage in a wide variety
of scenarios, with different plug-ins. However, currently the scripts focus
on usage with the \Value plug-in.

The main usage mode of \texttt{analysis-scripts} consists in creating a
Makefile dedicated to the analysis of a C code base.

This Makefile has three main purposes:

\begin{enumerate}
\item To separate the main analysis steps, saving partial results and logging
  output messages;
\item To avoid recomputing unnecessary data (e.g. the AST) when modifying
  analysis-specific options;
\item To document analysis options and improve replayability, e.g. when
  iteratively fine-tuning results.
\end{enumerate}

The intended usage is as follows:

\begin{enumerate}
\item The user identifies a C code base on which they would like to run
  Frama-C;
\item The user runs a script to interactively fill a template for
  the analysis, with the list of source files and required parameters
  (architecture, preprocessing flags, main function);
\item The user edits and runs the generated Makefile, adjusting the
  analysis as needed and re-running \texttt{make}.
\end{enumerate}

Ideally, after modifying the source code or re-parametrizing the analysis,
re-running \texttt{make} should be enough to obtain a new result.

Section~\ref{sec:using-generated-makefile} details usage of the Makefile
and presents an illustrative diagram.

\subsection{Alternative Workflows in the Absence of Build Information}
\label{alternative-workflows}

It is sometimes the case that the \FramaC user is not the developer of the
code under analysis, and does not have full build information about it;
or the code contains non-portable features or missing libraries which prevent
\FramaC from parsing it.
In such cases, these analysis scripts provide two alternative workflows,
depending on how one prefers to choose their source files:
{\em one at a time} or {\em all-in}, described below.

\subsubsection{One at a time}

In this workflow, the user starts from the entry point
of the analysis: typically the \texttt{main} function of a program or a test
case. Only the file defining that function is included at first. Then, using
\texttt{make-wrapper} (described in section~\ref{sec:script-descriptions}),
the user iteratively adds new sources as needed,
so that all function definitions are included.

\begin{itemize}
\item Advantages: higher code coverage; faster preprocessing/parsing; and
avoids including possibly unrelated files
(e.g. for an alternative OS/architecture).
\item Drawbacks: the iterative approach recomputes the analysis several times;
  also, it may miss files containing global initializers, which are not flagged
  as missing.
\end{itemize}

\subsubsection{All-in}

In this workflow, the user adds {\em all} source files to the analysis, and
if necessary removes some of them, e.g. when there are conflicting definitions,
such as when multiple test files define a \texttt{main} function.

\begin{itemize}
\item Advantages: optimistic approach; may not require any extra iterations, if
  everything is defined, and only once. Does not miss global initializers, and
  may find issues in code which is not reachable (e.g. syntax-related warnings).
\item Drawbacks: preprocesses and parses several files which may end up never
  being used; smaller code coverage; if parsing fails, it may be harder to
  find the cause (especially if due to unnecessary sources).
\end{itemize}

\subsection{Using a JSON Compilation Database (JCDB)}

Independently of the chosen workflow, some partial information can be retrieved
when CMake or Makefile scripts are available for compiling the sources.
They allow the production of a JSON Compilation Database
(\texttt{compile\_commands.json}, called JCDB for short; see related option
in section~\ref{sec:preprocessing}). This leads to a different workflow:

\begin{enumerate}
\item Run CMake with the flag \texttt{-DCMAKE\_EXPORT\_COMPILE\_COMMANDS=1},
  or install Build EAR (\url{https://github.com/rizsotto/Bear} and run
  \texttt{bear make <targets>} instead of \texttt{make <targets>}. This will
  create a \texttt{compile\_commands.json} file.
\item Run \texttt{frama-c-script list-files}. A list of the compiled files,
  along with files defining a \texttt{main} function, will be presented.
\item Run \texttt{frama-c-script make-template} to create a template for
  \FramaC/\Value. Answer ``yes'' when asked about using the
  \texttt{compile\_commands.json} file.
\end{enumerate}

Ideally, the above approach should result in a working template. In practice,
however, the compilation database may include extraneous sources
(used to compile other files than the target object) and duplicate flags
(e.g. when compiling the same source for different binary targets or test
cases). Manual intervention may be necessary.

\section{Using the generated Makefile}
\label{sec:using-generated-makefile}

The generated Makefile can be used to run one or several analyses.
The diagram in Fig.~\ref{fig:analysis-scripts} summarizes its usage.
Makefile targets and outputs are detailed in this section.

\begin{figure}[htbp]
  \begin{center}
    \includegraphics[width=\textwidth]{analysis-scripts.pdf}
    \caption{Overview of the {\em analysis-scripts} workflow:
      the inputs on the left produce a GNUmakefile which is then used
      for parsing, analyzing and visualizing results.}
    \label{fig:analysis-scripts}
  \end{center}
\end{figure}

Each analysis is defined in a target, written by the user as follows:

\texttt{<target>.parse: file1.c file2.c ...}

That is, the target name (chosen by the user), suffixed with \texttt{.parse},
is defined as depending on each of its source files. Changes to any of these
sources will trigger a recomputation of the AST.

{\em Note:} the target name itself {\em cannot} contain slashes or dots.
See also the {\em Technical Notes} section about some current limitations.

Then, for each \texttt{.parse} target, a corresponding \texttt{.eva} target
needs to be added to the \texttt{TARGETS} variable in the Makefile.
This will run \Value on the test case.

Each \texttt{.eva} target depends on its corresponding \texttt{.parse} target;
if the sources change, the analysis must take into account the new AST.

\subsection{Important Variables}

Several Makefile variables are available to customize \FramaC; the main
ones are presented below.

\begin{description}
\item[TARGETS]: as mentioned before, must contain the list of targets,
  suffixed with \texttt{.eva}.
\item[CPPFLAGS]: preprocessing options, passed to \FramaC inside option
  \texttt{-cpp-extra-args}, when parsing the sources.
\item[FCFLAGS]: extra command-line options given to \FramaC when parsing the
  sources and when running analyses. Typically, the options given to the
  \FramaC kernel.
\item[EVAFLAGS]: extra command-line options given to \Value when running
  its analyses.
\end{description}

These variables are defined globally, but they can also be customized for
each target; for instance, if a given target \texttt{t1} has a main
function \texttt{test1} and requires a global macro \texttt{-DTEST1}, but
target \texttt{t2}'s main function is \texttt{test2} and it requires
\texttt{-DTEST2}, you can locally modify \texttt{FCFLAGS} and \texttt{CPPFLAGS}
as follows:

\begin{lstlisting}
t1.parse: FCFLAGS += -main test1
t1.parse: CPPFLAGS += -DTEST1
t2.parse: FCFLAGS += -main test2
t2.parse: CPPFLAGS += -DTEST2
\end{lstlisting}

\subsection{Predefined targets}

The predefined targets below are the {\em raison d'être} of the generated
Makefile; they speed up analyses, provide self-documentation, and enable
quick iterations during parametrization of the analysis.

\begin{description}
\item[all (default target)]: the default target simply calls
  \texttt{<target>.eva}, for each \texttt{<target>} added to variable
  \texttt{TARGETS}. Does nothing once the analysis is finished and saved.
\item[<target>.parse]: runs \FramaC on the specified target, preprocessing
  and parsing its source files. Produces a directory \texttt{<target>.parse}
  containing several logs, a pretty-printed version of the parsed code, and
  a \FramaC session file (\texttt{framac.sav}) to be loaded in the GUI or by
  other analyses. Does nothing if parsing already happened.
\item[<target>.eva]: loads the parsed result (from the \texttt{.parse} target)
  and runs the \Value plug-in, with the options given in \texttt{EVAFLAGS}.
  If the analysis succeeds, produces a directory \texttt{<target>.eva} with the
  analysis results and a saved session file.
  Also creates a timestamped version of \texttt{<target>.eva}, to enable
  future comparisons between different parametrizations. The non-timestamped
  version corresponds to the latest (successful) analysis.
  If the analysis fails, tries to save a partial result in
  \texttt{<target>.eva.error} (when possible).
\item[<target>.eva.gui]: loads the result of the corresponding
  \texttt{<target>.eva} session and opens it in the GUI. This allows inspecting
  the results of \Value. This target always opens the GUI, even when no
  changes have happened.
\item[clean]: removes all results produced by the \texttt{.parse} and
  \texttt{.eva} targets.
\end{description}

\section{Script Descriptions}
\label{sec:script-descriptions}

The most useful commands are described below.
Run \texttt{frama-c-script help} for more details and optional arguments.

\begin{description}
\item[make-template]: used to create the initial Makefile, based on a template.
  This command creates a file named \texttt{GNUmakefile} with some hardcoded
  sections, some filled in interactively by the user, and comments indicating
  which parts may need change. Once created, it enables the general workflow
  mentioned earlier.
\item[make-wrapper <target> <args>]: calls \texttt{make <target> <args>} with
  a special wrapper: when running \Value, upon encountering one of a few known
  error messages, suggests some actions on how to proceed.
  For instance, if a missing function definition is encountered when analyzing
  the code with \Value, the wrapper will look for its definition and, if found,
  suggest that its source code be added to the analysis. This script is meant
  to be used with the {\em one at a time} workflow describe in
  section~\ref{alternative-workflows}.
\item[find-fun <fun>]: looks for possible declarations and definitions
  of function \texttt{<fun>}. Uses a heuristic that does not depend on \FramaC
  being able to parse the sources. Useful to find entry points and missing
  includes.
\end{description}

Other commands, only useful in a few cases, are described below.

\begin{description}
\item[configure <machdep>]: used to run a \texttt{configure}
  script (based on Autoconf) with some settings to emulate a more portable
  system, removing optional code features that could prevent \FramaC from
  parsing the sources. Currently still depends partially on the host system,
  so many features are not disabled.
\item[make-path] (for \FramaC developers): to be used when Frama-C is not
  installed in the PATH; adds a \texttt{frama-c-path.mk} file that is used
  by the Makefile generated by \texttt{make-template}.
\item[flamegraph]: opens a {\em flamegraph}\footnote{%
  See \url{https://github.com/brendangregg/FlameGraph} for details about
  flamegraphs.} to help visualize which functions take most of the time
  during analysis with \Value.
\item[summary]: for monitoring the progression of multiple analyses defined
  in a single Makefile. Presents a summary of the analyses when done. Mostly
  useful for benchmarking or when dealing with several test cases.
\end{description}

The following commands require a JSON Compilation Database.

\begin{description}
\item[list-files]: lists all files in the given JCDB. Useful for filling
  out the Makefile template when running \texttt{make-template}.
\item[normalize-jcdb]: converts absolute paths inside a
  \texttt{compile\_commands.json} file into relative paths (w.r.t. PWD)
  when possible. Used to allow moving/versioning the directory containing
  the JCDB file.
\end{description}

Finally, there is the following script, which is {\em not} available as a
command in \texttt{frama-c-script}, since its usage scenario is very
different. It is available at\\
\texttt{\$FRAMAC\_SHARE/analysis-scripts/creduce.sh}.

\begin{description}
\item[creduce.sh]: A script to help running the C-Reduce\footnote{%
  See https://embed.cs.utah.edu/creduce for more details.} tool to minify
  C programs causing crashes in \FramaC; useful e.g. when submitting a bug
  report to \FramaC, without needing to submit potentially confidential data.
  The script contains extensive comments about its usage. It is also
  described in a post from the \FramaC blog.
\end{description}

To use the \texttt{creduce.sh} script, you need to have the C-Reduce tool
installed in your path or in environment variable \texttt{CREDUCE}.

\section{Practical Examples: Open Source Case Studies}

The {\em open-source-case-studies} Git repository (OSCS for short),
available at \url{https://git.frama-c.com/pub/open-source-case-studies},
contains several open-source C code bases parametrized with the help of
analysis scripts. Each case study has its own directory, with a
\texttt{GNUmakefile} defining one or more analysis targets.

Due to the variety of test cases, OSCS provide practical usage
examples of the \texttt{GNUmakefile} described in this chapter.
They are periodically synchronized w.r.t. the public \FramaC repository
(daily snapshots), so they may contain features not yet available in the
major \FramaC releases. A few may also contain legacy features which
are no longer used; but overall, they provide useful examples and allow
the user to tweak analysis parameters to test their effects.

\section{Technical Notes}

This section lists known issues and technical details which may help users
understand some unintuitive behaviors.

\paragraph{\em Changes to header files do not trigger a new parsing/analysis.}
Currently, changes to included files (e.g. headers) are {\em not}
tracked by the generated Makefile and may require running \texttt{make}
with \texttt{-B} (to force recomputation of dependencies), or running
\texttt{make clean} before re-running \texttt{make}.

\paragraph{\em Why is the generated Makefile called \texttt{GNUmakefile}?}
GNU Make, by default, searches for a file named \texttt{GNUmakefile} before
searching for a \texttt{Makefile}. Thus, running \texttt{make} without
arguments results in running the Makefile generated by \texttt{make-template}.
You can rename it to \texttt{framac.mk} or something else, and then run it
via \texttt{make -f framac.mk <targets>}.

\paragraph{\em Most scripts are heuristics-based and offer no
  correctness/completeness guarantees.} In order to handle files {\em before}
the source preparation step is complete (that is, before \FramaC is able to
parse the sources into a complete AST), most commands use scripts based on
syntactic heuristics, which were found to work well in practice but are
easily disturbed by syntactic changes (e.g. whitespaces).

\paragraph{\em Most commands are experimental.} These analysis scripts are a
recent addition and subject to changes. They are provided on a best-effort
basis.