%% -------------------------------------------------------------------------- %% --- Analysis scripts %% -------------------------------------------------------------------------- \chapter{Analysis Scripts} \label{user-analysis-scripts} This chapter describes some tools and scripts shipped with \FramaC to help users setup and run analyses on large code bases. These scripts can also help dealing with unfamiliar code and automating analyses. \section{Requirements} These analysis scripts are chiefly based on the following tools: \begin{description} \item[Python]: most scripts are written using Python 3. Some scripts require features from specific Python versions, and perform version checks when starting. \item[GNU Make]: the main workflow for analysis scripts consists in using GNU Make (4.0 or newer) to compute dependencies and cache intermediate results. \item[Bash]: some scripts are written in Bash. \end{description} Besides those, a few tools are occasionally required by the scripts, such as Perl and GNU Parallel. \section{Usage} Most scripts are accessible via the \texttt{frama-c-script} command, installed along with \FramaC. Running this command without any arguments prints the list of commands, with a brief description of each of them. Some extra scripts are available by directly running them; in both cases, the actual scripts themselves are installed in Frama-C's \texttt{share} directory, underneath \texttt{analysis-scripts}. \subsection{General Framework} {\em Note}: while the analysis scripts are intended for usage in a wide variety of scenarios with different plug-ins, they currently focus on analyses with the \Value plug-in only. The main usage mode of \texttt{analysis-scripts} consists in creating a Makefile dedicated to the analysis of a C code base. This Makefile has three main purposes: \begin{enumerate} \item To separate the main analysis steps, saving partial results and logging output messages; \item To avoid recomputing unnecessary data when modifying analysis-specific options; \item To document analysis options and improve replayability, e.g. when iteratively fine-tuning the analysis in order to improve its results. \end{enumerate} The intended usage is as follows: \begin{enumerate} \item The user identifies a C code base on which they would like to run Frama-C; \item The user runs a script to interactively fill a template for the analysis, with the list of source files and required parameters (architecture, preprocessing flags, main function); \item The user edits and runs the generated Makefile, adjusting the analysis as needed and re-running \texttt{make}. \end{enumerate} Ideally, after modifying the source code or re-parametrizing the analysis, re-running \texttt{make} should be enough to obtain a new result. Section~\ref{sec:using-generated-makefile} details usage of the Makefile and presents an illustrative diagram. \subsection{Necessary Build Information} \label{sec:necessary-build-information} For an analysis with \FramaC, besides the source code, the following pieces of information are necessary: \begin{description} \item[machdep]: architectural information about system where the code will run: integer type sizes, compiler, OS, etc. See section~\ref{sec:normalize} for more details. \item[preprocessing flags]: options given to the C preprocessor, mainly macros (\texttt{-D}) and include directories (\texttt{-I}). \item[list of sources]: the actual list of source files that make a logical unit (e.g. a test case or a whole program), without duplicate function definitions. \item[main function]: the function where the analysis will start; it is often \texttt{main}, but not always. ({\em Note: \FramaC itself thes not require a \texttt{main} function, but plug-ins such as \texttt{Eva} do}.) \end{description} A project without such information is incomplete. In such cases, an alternative workflow is necessary. The next section presents some possibilities to retrieve such information. \subsection{Possible Workflows in the Absence of Build Information} \label{alternative-workflows} It is sometimes the case that the \FramaC user is not the developer of the code under analysis, and does not have full build information about it; or the code contains non-portable features or missing libraries which prevent \FramaC from parsing it. In such cases, these analysis scripts provide two alternative workflows, depending on how one prefers to choose their source files: {\em one at a time} or {\em all-in}, described below. \subsubsection{One at a time} In this workflow, the user starts from the entry point of the analysis: typically the \texttt{main} function of a program or a test case. Only the file defining that function is included at first. Then, using \texttt{make-wrapper} (described in section~\ref{sec:script-descriptions}), the user iteratively adds new sources as needed, so that all function definitions are included. \begin{itemize} \item Advantages: higher code coverage; faster preprocessing/parsing; and avoids including possibly unrelated files (e.g. for an alternative OS/architecture). \item Drawbacks: the iterative approach recomputes the analysis several times; also, it may miss files containing global initializers, which are not flagged as missing. \end{itemize} \subsubsection{All-in} In this workflow, the user adds {\em all} source files to the analysis, and if necessary removes some of them, e.g. when there are conflicting definitions, such as when multiple test files define a \texttt{main} function. \begin{itemize} \item Advantages: optimistic approach; may not require any extra iterations, if everything is defined, and only once. Does not miss global initializers, and may find issues in code which is not reachable (e.g. syntax-related warnings). \item Drawbacks: preprocesses and parses several files which may end up never being used; smaller code coverage; if parsing fails, it may be harder to find the cause (especially if due to unnecessary sources). \end{itemize} \subsection{Using a JSON Compilation Database (JCDB)} Independently of the chosen workflow, some partial information can be retrieved when CMake or Makefile scripts are available for compiling the sources. They allow the production of a JSON Compilation Database (\texttt{compile\_commands.json}, called JCDB for short; see related option in section~\ref{sec:preprocessing}). This leads to a different workflow: \begin{enumerate} \item Run CMake with the flag \texttt{-DCMAKE\_EXPORT\_COMPILE\_COMMANDS=1}, or install Build EAR (\url{https://github.com/rizsotto/Bear}) and run \texttt{bear make <targets>} instead of \texttt{make <targets>}. This will create a \texttt{compile\_commands.json} file. \item Run \texttt{frama-c-script list-files}. A list of the compiled files, along with files defining a \texttt{main} function, will be presented. \item Run \texttt{frama-c-script make-template} to create a template for \FramaC/\Value. Answer ``yes'' when asked about using the \texttt{compile\_commands.json} file. \end{enumerate} Ideally, the above approach should result in a working template. In practice, however, the compilation database may include extraneous sources (used to compile other files than the target object) and duplicate flags (e.g. when compiling the same source for different binary targets or test cases). Manual intervention may be necessary. \section{Using the generated Makefile} \label{sec:using-generated-makefile} The generated Makefile can be used to run one or several analyses. The diagram in Fig.~\ref{fig:analysis-scripts} summarizes its usage. Makefile targets and outputs are detailed in this section. \begin{figure}[htbp] \begin{center} \includegraphics[width=\textwidth]{analysis-scripts.pdf} \caption{Overview of the {\em analysis-scripts} workflow: the inputs on the left produce a GNUmakefile which is then used for parsing, analyzing and visualizing results.} \label{fig:analysis-scripts} \end{center} \end{figure} Each analysis is defined in a target, written by the user as follows: \texttt{<target>.parse: file1.c file2.c ...} That is, the target name (chosen by the user), suffixed with \texttt{.parse}, is defined as depending on each of its source files. Changes to any of these sources will trigger a recomputation of the AST. {\em Note:} the target name itself {\em cannot} contain slashes or dots. See also the {\em Technical Notes} section about some current limitations. Then, for each \texttt{.parse} target, a corresponding \texttt{.eva} target needs to be added to the \texttt{TARGETS} variable in the Makefile. This will run \Value on the test case. Each \texttt{.eva} target depends on its corresponding \texttt{.parse} target; if the sources change, the analysis must take into account the new AST. \subsection{Important Variables} Several Makefile variables are available to customize \FramaC; the main ones are presented below. \begin{description} \item[TARGETS]: as mentioned before, must contain the list of targets, suffixed with \texttt{.eva}. \item[CPPFLAGS]: preprocessing options, passed to \FramaC inside option \texttt{-cpp-extra-args}, when parsing the sources. \item[FCFLAGS]: extra command-line options given to \FramaC when parsing the sources and when running analyses. Typically, the options given to the \FramaC kernel. \item[EVAFLAGS]: extra command-line options given to \Value when running its analyses. \end{description} These variables are defined globally, but they can also be customized for each target; for instance, if a given target \texttt{t1} has a main function \texttt{test1} and requires a global macro \texttt{-DTEST1}, but target \texttt{t2}'s main function is \texttt{test2} and it requires \texttt{-DTEST2}, you can locally modify \texttt{FCFLAGS} and \texttt{CPPFLAGS} as follows: \begin{lstlisting} t1.parse: FCFLAGS += -main test1 t1.parse: CPPFLAGS += -DTEST1 t2.parse: FCFLAGS += -main test2 t2.parse: CPPFLAGS += -DTEST2 \end{lstlisting} \subsection{Predefined targets} The predefined targets below are the {\em raison d'ĂȘtre} of the generated Makefile; they speed up analyses, provide self-documentation, and enable quick iterations during parametrization of the analysis. \begin{description} \item[all (default target)]: the default target simply calls \texttt{<target>.eva}, for each \texttt{<target>} added to variable \texttt{TARGETS}. Does nothing once the analysis is finished and saved. \item[<target>.parse]: runs \FramaC on the specified target, preprocessing and parsing its source files. Produces a directory \texttt{<target>.parse} containing several logs, a pretty-printed version of the parsed code, and a \FramaC session file (\texttt{framac.sav}) to be loaded in the GUI or by other analyses. Does nothing if parsing already happened. \item[<target>.eva]: loads the parsed result (from the \texttt{.parse} target) and runs the \Value plug-in, with the options given in \texttt{EVAFLAGS}. If the analysis succeeds, produces a directory \texttt{<target>.eva} with the analysis results and a saved session file. Also creates a timestamped version of \texttt{<target>.eva}, to enable future comparisons between different parametrizations. The non-timestamped version corresponds to the latest (successful) analysis. If the analysis fails, tries to save a partial result in \texttt{<target>.eva.error} (when possible). \item[<target>.eva.gui]: loads the result of the corresponding \texttt{<target>.eva} session and opens it in the GUI. This allows inspecting the results of \Value. This target always opens the GUI, even when no changes have happened. \item[clean]: removes all results produced by the \texttt{.parse} and \texttt{.eva} targets. \end{description} \section{Script Descriptions} \label{sec:script-descriptions} The most useful commands are described below. Run \texttt{frama-c-script help} for more details and optional arguments. \begin{description} \item[make-template]: creates the initial Makefile, based on a template. This command creates a file named \texttt{GNUmakefile} with some hardcoded sections, some filled in interactively by the user, and comments indicating which parts may need change. Once created, it enables the general workflow mentioned earlier. \item[make-wrapper <target> <args>]: calls \texttt{make <target> <args>} with a special wrapper: when running \Value, upon encountering one of a few known error messages, suggests some actions on how to proceed. For instance, if a missing function definition is encountered when analyzing the code with \Value, the wrapper will look for its definition and, if found, suggest that its source code be added to the analysis. This script is meant to be used with the {\em one at a time} workflow describe in section~\ref{alternative-workflows}. \item[find-fun <fun>]: looks for possible declarations and definitions of function \texttt{<fun>}. Uses a heuristic that does not depend on \FramaC being able to parse the sources. Useful to find entry points and missing includes. \end{description} Other commands, only useful in a few cases, are described below. \begin{description} \item[configure <machdep>]: runs a \texttt{configure} script (based on Autoconf) with some settings to emulate a more portable system, removing optional code features that could prevent \FramaC from parsing the sources. Currently still depends partially on the host system, so many features are not disabled. \item[make-path] (for \FramaC developers): to be used when Frama-C is not installed in the PATH; adds a \texttt{frama-c-path.mk} file that is used by the Makefile generated by \texttt{make-template}. \item[flamegraph]: opens a {\em flamegraph}\footnote{% See \url{https://github.com/brendangregg/FlameGraph} for details about flamegraphs.} to visualize which functions take most of the time during analysis with \Value. \item[summary]: for monitoring the progression of multiple analyses defined in a single Makefile. Presents a summary of the analyses when done. Mostly useful for benchmarking or when dealing with several test cases. \end{description} The following commands require a JSON Compilation Database. \begin{description} \item[list-files]: lists all files in the given JCDB. Useful for filling out the Makefile template when running \texttt{make-template}. \item[normalize-jcdb]: converts absolute paths inside a \texttt{compile\_commands.json} file into relative paths (w.r.t. PWD) when possible. Used to allow moving/versioning the directory containing the JCDB file. \end{description} Finally, there is the following script, which is {\em not} available as a command in \texttt{frama-c-script}, since its usage scenario is very different. It is available at\\ \texttt{\$FRAMAC\_SHARE/analysis-scripts/creduce.sh}. \begin{description} \item[creduce.sh]: A script to help running the C-Reduce\footnote{% See \url{https://embed.cs.utah.edu/creduce} for more details.} tool to minify C programs causing crashes in \FramaC; useful e.g. when submitting a bug report to \FramaC, without needing to submit potentially confidential data. The script contains extensive comments about its usage. It is also described in a post\footnote{% {\em Debugging Frama-C analyses: better privacy with C-Reduce}, at \url{https://pub.frama-c.com/scripts/usability/2020/04/02/creduce.html}.} from the \FramaC blog. \end{description} To use the \texttt{creduce.sh} script, you need to have the C-Reduce tool installed in your path or in environment variable \texttt{CREDUCE}. \section{Practical Examples: Open Source Case Studies} The {\em open-source-case-studies} Git repository (OSCS for short), available at \url{https://git.frama-c.com/pub/open-source-case-studies}, contains several open-source C code bases parametrized with the help of analysis scripts. Each case study has its own directory, with a \texttt{GNUmakefile} defining one or more analysis targets. Due to the variety of test cases, OSCS provide practical usage examples of the \texttt{GNUmakefile} described in this chapter. They are periodically synchronized w.r.t. the public \FramaC repository (daily snapshots), so they may contain features not yet available in the major \FramaC releases. A few case studies may also contain legacy features which are no longer used; but overall, they provide useful examples and allow the user to tweak analysis parameters to test their effects. \section{Technical Notes} This section lists known issues and technical details which may help users understand some unintuitive behaviors. \paragraph{\em Changes to header files do not trigger a new parsing/analysis.} Currently, changes to included files (e.g. headers) are {\em not} tracked by the generated Makefile and may require running \texttt{make} with \texttt{-B} (to force recomputation of dependencies), or running \texttt{make clean} before re-running \texttt{make}. \paragraph{\em Why is the generated Makefile called \texttt{GNUmakefile}?} GNU Make, by default, searches for a file named \texttt{GNUmakefile} before searching for a \texttt{Makefile}. Thus, running \texttt{make} without arguments results in running the Makefile generated by \texttt{make-template}. You can rename it to \texttt{framac.mk} or something else, and then run it via \texttt{make -f framac.mk <targets>}. \paragraph{\em Most scripts are heuristics-based and offer no correctness/completeness guarantees.} In order to handle files {\em before} the source preparation step is complete (that is, before \FramaC is able to parse the sources into a complete AST), most commands use scripts based on syntactic heuristics, which were found to work well in practice but are easily disturbed by syntactic changes (e.g. whitespaces). \paragraph{\em Most commands are experimental.} These analysis scripts are a recent addition and subject to changes. They are provided on a best-effort basis.