diff --git a/doc/userman/Makefile b/doc/userman/Makefile index 106a989ba6da086d045ea4859f43f4aa95e61185..21c9edc7edc46b034c993e233b48a54072918532 100644 --- a/doc/userman/Makefile +++ b/doc/userman/Makefile @@ -4,7 +4,7 @@ C_CODE=$(wildcard examples/*.[ci]) VERSION_FILE=../../../../../VERSION ifeq ("$(wildcard $(VERSION_FILE))", "") VERSION_FILE=../../VERSION -FC_VERSION+=Chlorine+ +FC_VERSION+=Potassium+ else #internal mode FC_VERSION=$(shell cat $(VERSION_FILE)) @@ -13,6 +13,7 @@ FCLANG_VERSION= $(shell cat $(VERSION_FILE)) DEPS_MODERN=fclangversion.tex biblio.bib macros.tex \ introduction.tex \ + description.tex \ limitations.tex \ changes.tex \ $(C_CODE) \ @@ -68,7 +69,7 @@ fclangversion.tex: Makefile %.ml: %.mll ocamllex $< -%.pdf: %.tex +%.pdf: %.tex Makefile $(DEPS_MODERN) pdflatex $* makeindex $* bibtex $* diff --git a/doc/userman/description.tex b/doc/userman/description.tex index 56d55edda219cbc7aa30f80e23778ea6e0dacdd8..1070006b615deb656f2d592d407e8fcf7ea5f27d 100644 --- a/doc/userman/description.tex +++ b/doc/userman/description.tex @@ -1,18 +1,3 @@ -The \fclang plug-in intends to provide a full translation of C++ and \acslpp into the \framac internal representation, and from there be able to be analyzed by other \framac plug-ins. -This is a work in progress. -The following sections describe the limitations of the current implementation. -\begin{itemize} - \item The plug-in aims for the TBD version of C++ - \item \acslpp is described in the companion \acslpp reference manual, also a part of the \framac release. -\end{itemize} - - -Notes -\begin{itemize} - \item Which version of Clang? - \item Clang (8) support C++98 (except exported templates, later removed) and C++11 and current draft standard for C++1y - \item see https://clang.llvm.org/docs/UsersManual.html\#cxx for supported features in clang C++ -\end{itemize} \chapter{Installation} @@ -20,1131 +5,113 @@ TBD - installation instructions \chapter{Running the plug-in} -TBD - including command-line options +\section{C++ files} +Once installed the plugin is run automatically by Frama-C on any C++ files listed on the command-line. C++ files are identified by the filename suffix. The default C++ suffixes are these: + .cpp, .C, .cxx, .ii, .ixx, .ipp, .i++, .inl, .h, .hh -\chapter{Current implementation details and limitations} +Currently this set of suffixes is compiled in the plugin (in file frama\_Clang\_register.ml) and can only be changed by recompiling and +reinstalling the plugin. -\section{Preprocessing} +\section{Frama-clang executable} +The plug-in operates by invoking the executable \irg +on each file identified as C++, in turn. +For each file it produces a temporary output file containing the C AST, which is then given as input to frama-c. This executable is a single-file-at-a-time command-line executable only. +Various options control its behavior. -As a refresher, the C/C++ preprocessor (CPP) (cf. \url{https://gcc.gnu.org/onlinedocs/cpp/}) conceptually implements the following operations on a C++ source file: -\begin{itemize} - \item The input is read and broken it into a sequence of physical lines according to the line terminators (ASCII character sequences \\r, \\n, or \\r\\n). - \item Each C trigraph is replaced by its corresponding character. - \item Any backslash-whitespace-line-terminator sequence is removed and the line that it ends is combined with the following line. - \item Comments are replaced by single spaces. This requires tokenizing the input to avoid recognizing comment markers within strings as indicating a comment. Note that this allows block comments to hide line terminations. - \item The input text is divided into preprocessing tokens grouped in logical lines. Each preprocessor token becomes a compiler token (except where \#\# concatenation occurs). However, \acslb tokens are slightly different, as described below. - \item The source text is transformed according to any preprocessing directives contained within it. Each preprocessing directive must be contained within one logical line - -\end{itemize} -The result is a sequence of preprocessing tokens that is passed on to the -remaining compiler phases. +The file-system path identifying the executable is provided by the \textbf{-cxx-clang-command <cmd>} +option and is \irg by default. The path may be absolute; if it is a relative path, it is found relative to the TODO directory. [ TODO - check this] -\subsection{Trigraphs} -TBD - are these supported by clang? +\section{Frama-clang options} -\subsection{Preprocessor tokens} -Preprocessor tokens (per CPP) belong to one of five categories. White space (space, tab, TBD) serves only to separate tokens; it is not needed between tokens whose concatenation is not a single token. Line terminators also separate tokens and also delineate certain features: preprocessing directives and string literals do not extend over more than one (logical) line. -\begin{itemize} - \item identifiers: character sequences that match the regular expression \texttt{[a-zA-Z\_][a-zA-Z\_0-9]*} . Dollar signs are allowed as non-digit identifier characters if the clang option \texttt{-fdollars-in-identifiers} is used. (TBD - on by default?) - \item number: character sequences that match the regular expression \texttt{[.]?[0-9]([0-9a-zA-Z.]|[eEpP][+-]))*} . (TBD - dollar signs also allowed?) - \item string literals: character sequence enclosed in " " or ' ' or < >, with \textbackslash " for " in a double-quoted literal (that is not a header file name) and \' for ' in a single-quoted literal. - \item punctuator: all single non-alphanumeric printable characters except \@, \# and `, and all multi-punctuation sequences that meaningful to C/C++ (e.g. >>>= ), but not an arbitrary multi-punctuation character sequence. - \\item other tokens: \@, \#, ` and all non-ASCII single characters. -\end{itemize} -TBD - unicode characters -Note that not all preprocessor tokens are valid C/C++ parser tokens. Tokens in the other category have no meaning to C/C++ and the \texttt{number} category allows many sequences that are not legal C/C++ numeric tokens. These tokens will generally provoke compiler errors. For example in C/C++, 0..2 is one token and is not interpreted as two consecutive numeric tokens. +The options known to the \fcl plugin are processed by the \fc kernel when listed on the \fc command-line. +The use of the frama-c command-line is described in the core frama-c +user guide. +There are many kernel options that affect all plugins and many options specific to frama-clang. +The command \\ +\lstinline|frama-c -kernel-h| \\ +shows all kernel options; the command\\ +\lstinline|frama-c -fclang-h| \\ +shows all \fcl-specific options. -\acsl and \acslpp have slightly different tokens than the above, so the preprocessor tokens need to be re-tokenized in some cases: +The most important of the options are these: \begin{itemize} - \item The \@ token is a whitespace character in \acslb. - \item There are some \acslb multi-character punctuator tokens that are not - preprocessor tokens: - \begin{itemize} - \item[] $==>$ (logical implies) - \item[] $-->$ (bit-wise implies) - \item[] $<==>$ (logical equality) - \item[] $<-->$ (bit-wise equality) - \item[] TBD any more? - \end{itemize} - These \acslb tokens need to be assembled from multiple CPP tokens (and those CPP tokens must not be separated by white space) - \item A CPP numeric token that contains .. will not be a legal C/C++ number, but may be multiple legal \acslb tokens with the .. representing the range operator. For example, the single CPP token \texttt{0..last} is retokenized for \acslb as if it were written \texttt{0 .. last} . - \item \acslb allows certain built-in non-ASCII symbols, namely - \begin{itemize} - \item[] $\forall$ (unicode 0x2200) - universal quantifier - \item[] $\exists$ (unicode 0x2203) - existential quantifier -% \item[] $\equal$ (unicode 0x2261) - equality -% \item[] $\notequal$ (unicode 0x2262) - inequality - \item[] $\leq$ (unicode 0x2264) - less than or equal - \item[] $\geq$ (unicode 0x2265) - greater than or equal -% \item[] $\minus$ (unicode 0x2212) - minus -% \item[] $\implies$ (unicode 0x21D2) - implies -% \item[] $\equiv$ (unicode 0x21D4) - equivalence - \item[] $\bigwedge$ (unicode 0x2227) - logical and - \item[] $\bigvee$ (unicode 0x2228) - logical or - \item[] $\neg$ (unicode 0x00AC) - logical negation -% \item[] $\logicalxor$ (unicode 0x22BB) - logical inequivalence - \item[] $\subset$ (unicode 0x2208) - subset -% \item[] $\Boolean$ (unicode 0x1D539) - set of booleans -% \item[] $\Integer$ (unicode 0x2124) - set of integers -% \item[] $\Real$ (unicode 0x211D) - set of reals - \item TBD - more - ensure unicode equivalent - \end{itemize} + \item \option{-cpp-extra-args <string>} -- the single string argument to this option is appended to the command-line when + \lstinline|frama-clang| is invoked internally. It is particularly + important for adding include directories (\lstinline|-I|) and + other options to be passed on to the clang compiler or the \irg executable. + Multiple instances of this option have a cumulative effect (rather + than later instances replacing earlier ones). [TODO - check this] + \item \option{-machdep <arg>} -- TODO + \item \option{-arg <arg>} -- TODO + \item \option{-fclang-msg-key <categories>} -- TODO + \item \option{-fclang-warn-key <categories>} -- TODO + \item \option{-fclang-verbose <n>} -- TODO + \item \option{-fclang-debug <n>} -- TODO + \item TODO \end{itemize} -\subsection{Preprocessor directives} -A preprocessing directive consists of a line (after the previous preprocessing phaseshave been completed) that begins with optional white space, the '\#' character, additional optional white space, and a preprocessor directive identifier. -The preprocessing language contains a fixed set of preprocessing directive identifiers: -\begin{itemize} - \item \texttt{define}, \texttt{undef} - \item \texttt{if}, \texttt{ifdef}, \texttt{ifndef}, \texttt{elif}, \texttt{else}, \texttt{endif} - \item \texttt{warning}, \texttt{error} - \item \texttt{include} - \item \texttt{line} - \item \texttt{pragma} -\end{itemize} -In addition, identifiers that have been defined (by a \#define directive) as macros are expanded according to the macro expansion rules (not described here). +\section{Include directories} -Now \acslb annotations are contained in C/C++ comments. -Consequently, any directives contained in the annotation are not seen when the source file is processed simply as a C/C++ source file. However, the effect of some directives lasts until the end of the source file. -Accordingly, \acslpp imposes constraints on the directives that may be present within annotations: -\begin{itemize} - \item \texttt{define} and \texttt{undef} are not permitted in an annotation - \item \texttt{if}, \texttt{ifdef}, \texttt{ifndef}, \texttt{elif}, \texttt{else}, \texttt{endif} are permitted but must be completely nested within the annotation in which they appear (an \#endif and its corresponding \#if/\#ifdef/\#ifndef must be in the same annotation comment.) - \item \texttt{warning} and \texttt{error} are permitted - \item \texttt{include} is permitted, but will cause errors if it contains, as is almost always the case, either \texttt{define} or \texttt{pragma} directives - \item \texttt{line} - TBD - \item \texttt{pragma} is not permitted -\end{itemize} +TODO +\section{32 and 64-bit targets} +TODO -\section{\acslpp constructs} -\textit{This section is not yet written} +\chapter{Running the frama-clang front-end standalone} -%This chapter is the core of this manual and describes how to use the \fclang -%plug-in. +In normal use within \fc, the \irg executable is +invoked automatically. However, it can also be run standalone. +In this mode it accepts command-line options and a single file; +it produces a C AST representing the translated C++, in a text format +of the CIL (C Intermediate language). [TODO - or is it CABS] +\section{Clang options} -%% You can still call the option \shortopt{e-acsl-help} to get the list of -%% available options with few lines of documentation. -%First, Section~\ref{sec:simple} shows how to run the plug-in on a toy example, -%compile the generated code with the \gcc compiler and detect invalid -%annotations at runtime. Further, Section~\ref{sec:wrapper} describes \eacslgcc\ -%-- a convenience wrapper script around \framac and \gcc. The aim of this script -%is to simplify instrumentation and compilation of the instrumented code. -%Section~\ref{sec:exec-env} gives additional details on the execution of the -%generated code. Next, Section~\ref{sec:incomplete} focuses on how to deal with -%incomplete programs, \emph{i.e.} in which some functions are not defined or in -%which there are no main function. Section~\ref{sec:combine} explains how to -%combine the \eacsl plug-in with other \framac plug-ins. Finally, -%Section~\ref{sec:custom} introduces how to customize the plug-in, and -%Section~\ref{sec:verbose} details the verbosing policy of the plug-in. +Frama-Clang is built on the Clang C++ parser. +As such, the \irg executable accepts the clang +compiler options and passes them on to clang. There are many of these. +Many of these are irrelevant to \fcl as they relate to lstinline|framaCIRGen| +code generation, whereas \fcl only uses clang for parsing, name +and type resolution. +You can see a list of options by running +\lstinline|framaCIRGen -help| + +The most significant options for \irg are these: +\begin{itemize} + \item \option{-I} -- adds a directory to the include file search path + \item \option{-o <file>} -- specifies the location of the output file (in this case, the file to contain the generated AST) + \item \option{-std=...} -- the input programming language (default is \lstinline|-std=c++11|) + \item \option{-w} -- suppress warnings + \item \option{-{-}version} -- print version information +\end{itemize} + +\section{Frama-Clang specific options} + +There are also options that are specific to \fcl. +These are listed below: +\begin{itemize} + \item \option{-{-}stop-annot-error} -- if set, then parsing stops on the first error; default is off + \item \option{-{-}gen-impl-method} -- TODO + \item \option{-{-}exit-code} -- if set, then the exit code of \irg is 1 if errors occur; this is not the default because then \lstinline|frama-c| would terminate upon + any error in \lstinline|framaCIRGen| + \item \option{-{-}no-exit-code} -- disables \option{-{-}exit-code} + \item \option{-{-}info=<n>} -- sets the level of informational messages to \textbf{n}; 0 is completely quiet; increasing values are + more verbose. \option{-{-}info} sets the level to 1; + \option{-{-}no-info} sets the level to 0. The \lstinline|frama-c| option \option{-fclang-msg-key=parse} is equivalent to setting a value of 1. + \item \option{-{-}warn=<n>} -- sets the level of parser warning messages to \textbf{n}; 0 is completely quiet; increasing values are +more verbose. \option{-{-}warn} sets the level to 1; +\option{-{-}no-warnyyp} sets the level to 0. The \lstinline|frama-c| option \option{-fclang-warn-key=parse} is equivalent to setting a value of 1. + + \item \option{-{-}debug=<n>} -- sets the level of parser debug messages to \textbf{n}; 0 is completely quiet; increasing values are +more verbose. \option{-{-}debug} sets the level to 1; +\option{-{-}no-debug} sets the level to 0. The \lstinline|frama-c| option \option{-fclang-debug=<n>} is equivalent to setting a value of \textbf{n}. +In particular, a debug value of 1 shows the command-line that invokes +\irg. +\item TODO: -b -x -o -v + +\end{itemize} -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%\section{Simple Example} % <<< -%\label{sec:simple} -% -%This section is a mini-tutorial which explains from scratch how to use the -%\eacsl plug-in to detect whether an \eacsl annotation is violated at runtime. -% -%% All tool invocations and code listings shown in this section assume a Linux -%% distribution with system-wide \framac installation. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -%\subsection{Running \eacsl} -%\label{sec:run} -% -%Consider the following program containing two \eacsl assertions such that -%the first assertion is valid, whereas the second one is not. -% -%\listingname{first.i} -%\cinput{examples/first.i} -% -%Running \eacsl on \texttt{first.i} consists of adding \shortopt{e-acsl} option -%to the \framac command line: -%\begin{shell} -%\$ frama-c -e-acsl first.i -%[kernel] Parsing FRAMAC_SHARE/libc/__fc_builtin_for_normalization.i (no preprocessing) -%[kernel] Parsing FRAMAC_SHARE/e-acsl/e_acsl.h (with preprocessing) -%[kernel] Parsing FRAMAC_SHARE/e-acsl/e_acsl_gmp.h (with preprocessing) -%[kernel] Parsing FRAMAC_SHARE/e-acsl/e_acsl_mmodel_api.h (with preprocessing) -%[kernel] Parsing first.i (no preprocessing) -%[e-acsl] beginning translation. -%<skip a warning when translating the Frama-C libc> -%[e-acsl] translation done in project "e-acsl". -%\end{shell} -% -%Even though \texttt{first.i} has already been pre-processed, \eacsl first asks -%the \framac kernel to process and combine it against several header files -%provided by the \eacsl plug-in. Further \eacsl translates the annotated code -%into a new \framac project named \texttt{e-acsl}\footnote{The notion of -%\emph{project} is explained in Section 8.1 of the \framac user -%manual~\cite{userman}.}. By default, the option \shortopt{e-acsl} does nothing -%more. It is however possible to have a look at the generated code in the -%\framac GUI. This is also possible through the command line thanks to the -%kernel options \shortopt{then-last} and \shortopt{print}. The former -%switches to the last generated project, while the latter pretty prints the \C -%code~\cite{userman}: -% -%\input{examples/instrumented_first.c} -% -%As you can see, the generated code contains additional type declarations, -%constant declarations and global \acsl annotations not present in the initial -%file \texttt{first.i}. They are part of the \eacsl monitoring library. You can -%safely ignore them for now. The translated \texttt{main} function of -%\texttt{first.i} is displayed at the end. After each \eacsl annotation, -%a call to \texttt{\_\_e\_acsl\_assert}\codeidx{e\_acsl\_assert} has been added. -% -%\begin{minipage}{\linewidth} -%\begin{ccode} -% /*@ assert x == 0; */ -% __e_acsl_assert(x == 0,(char *)"Assertion",(char *)"main",(char *)"x == 0",3); -% /*@ assert x == 1; */ -% __e_acsl_assert(x == 1,(char *)"Assertion",(char *)"main",(char *)"x == 1",4); -%\end{ccode} -%\end{minipage} -% -%Each call to \texttt{\_\_e\_acsl\_assert}\codeidx{e\_acsl\_assert}, function -%defined by the \eacsl monitoring library, dynamically verifies the validity of -%the corresponding assertion. In other words, it checks that its first argument -%(here \texttt{x == 0} or \texttt{x == 1}) is not equal to 0 and fails -%otherwise. The extra arguments are used to display error reports as shown in -%Section~\ref{sec:exec}. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -%\subsection{Executing the generated code} -%\label{sec:exec} -% -%Using \shortopt{ocode} \framac~\cite{userman} option, the code generated by the -%\eacsl plug-in can be redirected into a file as follows: -%\begin{shell} -%\$ frama-c -e-acsl first.i -then-last -print -ocode monitored_first.c -%\end{shell} -% -%\framac uses architecture-dependent configuration which -%affects sizes of integer types, endianness and potentially other features. It -%can be seen that the code generated from \texttt{first.i} (shown in the -%previous section) defines \C type \texttt{size\_t} as \texttt{unsigned int}, whereas -%in 64-bit architectures \texttt{size\_t} is typically defined as -%\texttt{unsigned long}. Architecture used during \framac translation is -%controlled through \framac \shortopt{machdep} option that specifies the -%architecture type to use during translation. The default value of -%\shortopt{machdep} is \texttt{x86\_32} (a generic 32-bit x86 architecture). -%Note that since code generated by \eacsl is aimed at being compiled it is -%important that the architecture used by \framac matches the architecture -%corresponding to your compiler and your system. For instance, in a 64-bit -%machine you should also pass -%\shortopt{machdep} \texttt{x86\_64} option as follows: -%\begin{shell} -% \$ frama-c -machdep x86_64 -e-acsl first.i -then-last \ -% -print -ocode monitored_first.c -%\end{shell} -% -%This file can be compile with a \C compiler (for instance \gcc), as follows: -% -%\lstset{escapechar=£} -%\begin{shell} -%\$ gcc -c -Wno-attributes monitored_first.c -%\end{shell} -% -%However, creating an executable through a proper invokation to \gcc is painful -%and is not documented. Instead, \eacsl comes with a companion wrapper script for -%this purpose. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%>>> -%\section{\eacsl Wrapper Script} % <<< -%\label{sec:wrapper} -% -%\begin{important}\index{Gcc} -% Presently, \eacslgcc is a recommended way of running the \eacsl plug-in. -% This manual further describes features of the \eacsl plug-in using \eacslgcc -% as its main driver for source code instrumentation and compilation of the -% instrumented programs. -%\end{important} -% -%In this section we describe \eacslgcc and provide several examples of its use. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%\subsection{Standard Usage} -% -%The default behaviour of \eacslgcc is to instrument an annotated C program -%with runtime assertions via a run of \framac. -% -%\begin{shell} -% \$ e-acsl-gcc.sh -omonitored_first.i first.i -%\end{shell} -% -%The above command instruments \texttt{first.i} with runtime assertions and -%outputs the generated code to \texttt{monitored\_first.i}. Unless the name of -%the output file is specified via the \shortopt{o} option (i.e., -%\shortopt{-omonitored\_first.i} in the above example), the generated -%code is placed to a file named \texttt{a.out.frama.c}. -% -%Compilation and Linking of the original and the instrumented sources -%is enabled using the \shortopt{c} option. For instance, command -% -%\begin{shell} -% \$ e-acsl-gcc.sh -c -omonitored_first.i first.i -%\end{shell} -% -%instruments the code in \texttt{first.i}, outputs it to -%\texttt{monitored\_first.i} and produces 2 executables: \texttt{a.out} and -%\texttt{a.out.e-acsl}, such that \texttt{a.out} is an executable compiled from -%\texttt{first.i}, while \texttt{a.out.e-acsl} is an executable compiled from -%\texttt{monitored\_first.i} generated by \eacsl. -% -%You can execute the generate binaries, in particular \texttt{a.out.e-acsl} which -%detects at runtime the incorrect annotation. -%\begin{shell} -%\$ ./a.out.e-acsl -%Assertion failed at line 4 in function main. -%The failing predicate is: -%x == 1. -%Aborted -%\$ echo \$? -%134 -%\end{shell} -%The execution aborts with a non-zero exit code (134) and an error message -%indicating \eacsl annotation that has been violated. There is no output for the -%valid \eacsl annotation. That is, the run of an executable generated from the -%instrumented source file (i.e., \texttt{monitored\_first.i}) detects that the -%second annotation in the initial program is invalid, whereas the first -%annotation holds for this execution. -% -%Named executables can be created using the \shortopt{O} option. This is such -%that a value supplied to the \shortopt{O} option is used to name the executable -%generated from the original program and the executable generated from the -%\eacsl-instrumented code is suffixed \texttt{.e-acsl}. -% -%For example, command -%\begin{shell} -% \$ e-acsl-gcc.sh -c -omonitored_first.i -Omonitored_first first.i -%\end{shell} -%names the executables generated from \texttt{first.i} and -%\texttt{monitored\_first.i}: \texttt{monitored\_first} and -%\texttt{monitored\_first.e-acsl} respectively. -% -%The \eacslgcc \shortopt{C} option allows to skip the instrumentation step and -%compile a given program as if it was already instrumented by the \eacsl -%plug-in. This option is useful if one makes changes to an already -%instrumented source file by hand and only wants to recompile it. -% -%\begin{shell} -% \$ e-acsl-gcc.sh -C -Omonitored_first monitored_first.i -%\end{shell} -% -%The above command generates a single executable named -%\texttt{monitored\_first.e-acsl}. This is because \texttt{monitored\_first.i} is -%considered to be instrumented code and thus no original program is provided. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%\subsection{Customising \framac and \gcc Invocations} -% -%By default \eacslgcc uses \T{frama-c} and \T{gcc} executables by looking them -%up in a system's path. If either of the tools are not found the script fails -%with an appropriate error message. Paths to \framac and \gcc executables can -%also be passed using \shortopt{I} and \shortopt{G} options -%respectively, for instance as follows: -% -%\begin{shell} -% \$ e-acsl-gcc.sh -I/usr/local/bin/frama-c -G/usr/bin/gcc6 foo.c -%\end{shell} -% -%Runs of \framac and \gcc issued by \eacslgcc can further be customized by using -%additional flags. \framac flags can be passed using the \shortopt{F} option, -%while \shortopt{l} and \shortopt{e} options allow for passing additional -%pre-processor and linker flags during \gcc invocations. For example, command -% -%\begin{shell} -% \$ e-acsl-gcc.sh -c -F"-unicode" -e"-I/bar -I/baz" foo.c -%\end{shell} -% -%yields an instrumented program \texttt{a.out.frama.c} where ACSL formulas -%are output using the UTF-8 character set. Further, during the compilation of -%\texttt{a.out.frama.c} \gcc adds directories \texttt{/bar} and -%\texttt{/baz} to the list of directories to be searched for header files. -% -%It is worth mentioning that \eacslgcc implements several convenience flags for -%setting some of the \framac options. For instance, \shortopt{E} can be used to -%pass additional arguments to the \framac pre-processor (see -%Section~\ref{sec:eacsl-gcc:doc}). -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%\subsection{Verbosity Levels and Output Redirection} -% -%By default \eacslgcc prints the executed \framac and \gcc commands and pipes -%their output to the terminal. The verbosity of \framac output can be changed -%using \shortopt{v} and \shortopt{d} options used to pass values to -%\shortopt{verbose} and \shortopt{debug} flags of \framac respectively. For -%instance, to increase the verbosity of plug-ins but suppress the verbosity of -%the \framac kernel while instrumenting \texttt{foo.c} one can use the following -%command: -% -%\begin{shell} -% \$ e-acsl-gcc.sh -v5 -F"-kernel-verbose 0" foo.c -%\end{shell} -% -%Verbosity of the \eacslgcc output can also be reduced using the \shortopt{q} -%option that suppresses any output except errors or warnings issued by \gcc, -%\framac, and \eacslgcc itself. -% -%The output of \eacslgcc can also be redirected to a specified file using the -%\shortopt{s} option. For example, the following command redirects all -%output during instrumentation and compilation of \texttt{foo.c} to the -%file named \texttt{/tmp/log}. -% -%\begin{shell} -% \$ e-acsl-gcc.sh -c -s/tmp/log foo.c -%\end{shell} -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%\subsection{Architecture Dependency} -% -%As pointed out in Section~\ref{sec:exec} the execution of a \C program depends -%on the underlying machine architecture it is executed on. The program must be -%compiled on the very same architecture (or cross-compiled for it) for the -%compiler being able to generate a correct binary. -% -%\framac makes assumptions about the machine architecture when analyzing source -%code. By default, it assumes an X86 32-bit platform, but it can be customized -%through \shortopt{machdep} switch~\cite{userman}. This option is of primary -%importance when using the \eacsl plug-in: it must be set to the value -%corresponding to the machine architecture which the generated code will be -%executed on, otherwise the generated program is likely to fail to link against -%the \eacsl library. Note that the library is built using the default -%machine architecture. -% -%One of the benefits of using the wrapper script is that it makes sure that -%\framac is invoked with the correct \shortopt{machdep} option. By default -%\eacslgcc uses \shortopt{machdep gcc\_x86\_64} for 64-bit architectures and -%\shortopt{machdep gcc\_x86\_32} for 32-bit ones. Compiler invocations are -%further passed \shortopt{m64} or \shortopt{m32} options to generate code using -%64-bit or 32-bit ABI respectively. -% -%At the present stage of implementation \eacsl does not support generating -%executables with ABI different to the default one. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%\subsection{Documentation}\label{sec:eacsl-gcc:doc} -% -%For full up-to-date documentation of \eacslgcc see its man page: -% -%\begin{shell} -% \$ man e-acsl-gcc.sh -%\end{shell} -% -%Alternatively, you can also use the \shortopt{h} option of \eacslgcc: -% -%\begin{shell} -% \$ man e-acsl-gcc.sh -h -%\end{shell} -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%>>> -%\section{Execution Environment of the Generated Code} %<<< -%\label{sec:exec-env} -% -%The environment in which the code is executed is not necessarily the same as -%the one assumed by \framac. You should take care of that when running the \eacsl -%plug-in and when compiling the generated code with \gcc. In this aspect, the -%plug-in offers a few possibilities of customization. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -%\subsection{Runtime Errors in Annotations} -%\label{sec:runtime-error} -%\index{Runtime Error} -% -%The major difference between \acsl~\cite{acsl} and \eacsl~\cite{eacsl} -%specification languages is that the logic is total in the former while it is -%partial in the latter one: the semantics of a logic construct $c$ denoting a \C -%expression $e$ -%is undefined if $e$ leads to a runtime error and, consequently, the semantics of -%any term $t$ (resp. predicate $p$) containing such a construct $c$ is -%undefined as soon as $e$ has to be evaluated in order to evaluate $t$ -%(resp. $p$). The \eacsl Reference Manual also states that \emph{``it is the -%responsibility of each tool which interprets \eacsl to ensure that an -%undefined term is never evaluated''}~\cite{eacsl}. -% -%Accordingly, the \eacsl plug-in prevents an undefined term from being -%evaluated. If an annotation contains such a term, \eacsl will report a proper -%error at runtime instead of evaluating it. -% -%Consider for instance the following annotated program. -% -%\listingname{rte.i} -%\cinput{examples/rte.i} -% -%The terms and predicates containing the modulo `\%' in the \T{assumes} clause -%are undefined in the context of the \T{main} function call since the second -%argument is equal to 0. However, we can generate an instrumented code and -%compile it using the following command: -% -%\begin{shell} -%\$ e-acsl-gcc.sh -c -omonitored_rte.i -Orte rte.i -%\end{shell} -% -%Now, when \T{rte.e-acsl} is executed, you get the following output indicating -%that your function contract is invalid because it contains a division by zero. -% -%\begin{shell} -%\$ ./rte.e-acsl -%RTE failed at line 5 in function is_dividable. -%The failing predicate is: -%division_by_zero: y != 0. -%Aborted -%\end{shell} -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -%\subsection{Libc}\index{Libc} -% -%By default, \framac uses its own standard library (\emph{aka} libc) instead of -%the one of your system. If you do not want to use it, you have to add the option -%\shortopt{no-framac-stdlib}\optionidx{-}{framac-stdlib} to the \framac command -%line. -% -%It might be particularly useful in one of the following contexts: -%\begin{itemize} -%\item several libc functions used by the analyzed program are still not defined -% in the \framac libc; or -%\item your system libc and the \framac libc mismatch about several function -% types (for instance, your system libc is not 100\% POSIX compliant); or -%\item several \framac lib functions get a contract and you are not interested in -% verifying them (for instance, only checking undefine behaviors matters). -%\end{itemize} -% -%\begin{important} \index{eacslgcc} -%Notably, \eacslgcc disables \framac libc by default. This is because most of -%its functions are annotated with \eacsl contracts and generating code for -%these contracts results in additional runtime overheads. To enforce default -%\framac contracts with \eacslgcc you can pass \shortopt{L} option to the wrapper -%script. -%\end{important} -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -%\subsection{Integers} -%\label{sec:integers} -%\index{GMP} -% -%\eacsl uses an \texttt{integer}\codeidx{integer} type corresponding to -%mathematical integers which does not fit in any integral \C type. To circumvent -%this issue, \eacsl uses the \gmp library\footnote{\url{http://gmplib.org}}. -%Thus, even if \eacsl does its best to use standard integral types instead of -%\gmp~\cite{sac13,jfla15}, it may generate such integers from time to time. It is -%notably the case if your program contains \lstinline|long long|, either signed -%or unsigned. -% -%Consider for instance the following program. -% -%\listingname{gmp.i} -%\cinput{examples/gmp.i} -% -%Even on a 64-bit machine, it is not possible to compute the assertion with a -%standard \C type. In this case, the \eacsl plug-in generates \gmp code. Note -%that E-ACSL library already includes \gmp, thus no additional arguments need to -%be provided. We can generate and execute the instrumented code as usual via -%the following command: -% -%\begin{shell} -%\$ e-acsl-gcc.sh -omonitored_gmp.i -Ogmp gmp.i -%\$ ./gmp.e-acsl -%\end{shell} -% -%Since the assertion is valid, there is no output in this case. -% -%Note that it is possible to use \gmp arithmetic in all cases (even if not -%required). This option is controlled through \shortopt{g} switch of \eacslgcc -%which passes the \shortopt{e-acsl-gmp-only} option to the \eacsl -%plug-in. However it could slow down the execution of the instrumented program -%significantly. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -%\subsection{Memory-related Annotations} -%\label{sec:memory} -% -%The \eacsl plug-in handles memory-related annotations such as -%\lstinline|\valid|. Consider for instance the following program. -% -%\listingname{valid.c} -%\cinput{examples/valid.c} -% -%Even though the first annotation holds, the second annotation (labelled -%\I{freed}) is invalid. This is because at the point of its execution memory -%allocated via \texttt{malloc} has been deallocated and \T{x} is a stale pointer -%referencing unallocated memory. The execution of the instrumented program thus -%reports the violation of the second annotation: -% -%\begin{shell} -%\$ e-acsl-gcc.sh -c -Ovalid valid.c -%\$ ./valid.e-acsl -%Assertion failed at line 11 in function main. -%The failing predicate is: -%freed: \valid(x). -%Aborted -%\end{shell} -% -%To check memory-related predicates (such as \lstinline|\valid|) \eacsl tracks -%memory allocated by a program at runtime. \eacsl records memory blocks in a -%meta-storage tracking each block's boundaries (i.e., its base address and -%length), per-byte initialization and the notion of an address being freeable -%(i.e., whether it can be safely passed to the \texttt{free} function). During -%an execution of the instrumented program code injected by \eacsl -%transformations queries the meta-storage to identify properties of an address -%in question. For instance, during the execution of the above program the -%monitor injected by \eacsl first records information about the memory block -%allocated via a call to \texttt{malloc}. Further, during the execution of the -%first assertion the monitor queries the meta-storage about the address of the -%memory location pointed to by \texttt{x}. Since the location belongs to an -%allocated memory block, the meta-storage identifies it as belonging to the -%tracked allocation. Consequently, the assertion evaluates to true and the -%monitors allows the execution to continue. The second assertion, however, -%fails. This is because the block pointed to by \T{x} has been removed from the -%meta-storage by a call to \T{free}, and thus a meta-storage query identifies -%location \T{x} as belonging unallocated memory space. -% -%\subsubsection{\eacsl Memory Models} -% -%Memory tracking implemented by the \eacsl library comes in two flavours dubbed -%\I{bittree-based} and \I{segment-based} memory models. With the bittree-based -%model meta-storage is implemented as a compact prefix trie called Patricia -%trie~\cite{rv13}, whereas segment-based memory model -%\footnote{Segment-based model of \eacsl has not yet appeared in the literature.} -%is based on shadow memory~\cite{pldi16}. The functionality of the provided -%memory models is equivalent, however they differ in performance. We further -%discuss performance considerations. -% -%The \eacsl models allocate heap memory using a customized version of the -%\dlmalloc\footnote{\url{http://dlmalloc.net/}} memory allocator replacing -%system-wide \T{malloc} and similar functions (e.g., \T{calloc} or \T{free}). -% -%At the level of \eacslgcc you can choose between different memory models using -%the \shortopt{m} switch followed by the name of the memory model to link -%against. For instance, command -%\begin{shell} -%\$ e-acsl-gcc.sh -mbittree -c -Ovalid valid.c -%\end{shell} -%links a program generated from \T{valid.c} against the library implementing the -%bittree-based memory model, whereas the following invocation uses shadow-based -%tracking: -%\begin{shell} -%\$ e-acsl-gcc.sh -msegment -c -Ovalid valid.c -%\end{shell} -%It is also possible to generate executables using both models as follows: -%\begin{shell} -%\$ e-acsl-gcc.sh -msegment,bittree -c -Ovalid valid.c -%\end{shell} -%In this case, \eacslgcc produces 3 executables: \T{valid}, -%\T{valid.e-acsl-segment} and \T{valid.e-acsl-bittree} corresponding to an -%executable generated from the original source code, and the executables -%generated from the \eacsl-instrumented sources linked against -%segment-based and bittree-based memory models. -% -%Unless specified, \eacsl defaults to the segment-based memory model. -% -%\subsubsection{Performance Considerations} -%\label{sec:perf} -% -%As pointed out in the previous section the functionalities provided by both -%segment-based and bittree-based memory models are equivalent but differ in -%performance. The bittree-based model uses compact memory representation of -%metadata. The segment-based model on the other hand uses significantly more -%memory. You can expect over 2x times memory overheads when using it. However It -%is often an order of magnitude faster than the bittree-based -%model~\cite{pldi16}. -% -%%[Need ref here with comparison of the performance. Ideally a reference to the PLDI paper]. -% -%\subsubsection{Maximized Memory Tracking} -% -%By default \eacsl uses static analysis to reduce instrumentation and therefore -%runtime and memory overheads~\cite{scp16}. With respect to memory tracking this -%means that -%\eacsl may omit recording memory blocks irrelevant for runtime analysis. It -%is, however, still possible to systematically instrument the code for handling -%potential memory-related annotations even when it is not required. This -%feature can be enabled using the \shortopt{M} switch of \eacslgcc or -%\shortopt{e-acsl-full-mmodel} option of the \eacsl plug-in. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%% \subsection{Debug Libraries} -%%% \label{sec:runtime-debug} -% -%%% \eacsl memory models mentioned in Section~\ref{sec:memory} are optimized for -%%% performance (via \shortopt{-O2} compiler flag). An \eacsl installation also -%%% provides unoptimized versions of these libraries (suffixed \T{-dbg.a}) aimed at -%%% being used during debugging. -% -%%% The main difference between unoptimized (debug) and optimized (production) -%%% libraries is that debug libraries include assertions validating the correctness -%%% of the RTL itself. For instance, during memory tracking such assertions ensure -%%% that during a run of a program all tracked memory blocks are disjoint. -% -%%% Consider the following program that violates an annotation -%%% in function \T{foo} via a call \T{bar(p+1)} in the main function: -% -%%% \listingname{rte\_debug.i} -%%% \cinput{examples/rte_debug.i} -% -%%% A monitored executable linked against a debug version of \eacsl RTL -%%% can ge generated using -%%% \longopt{rt-debug} flag of \eacslgcc as follows: -% -%%% \begin{shell} -%%% \$ e-acsl-gcc.sh --rt-debug -c -omonitored_rte_debug.c -Orte_debug rte_debug.i -%%% \end{shell} -% -%%% A run of the monitored executable shows violation of the \eacsl annotation. -%%% \begin{shell} -%%% \$ rte_debug.e-acsl -%%% Assertion failed at line 2 in function foo. -%%% The failing predicate is: -%%% \valid(p). -%%% /** Backtrace **************************/ -%%% trace at e_acsl_trace.h:45 -%%% - exec_abort at e_acsl_assert.h:58 -%%% - runtime_assert at e_acsl_assert.h:93 -%%% - foo at monitored_rte_debug.c:92 -%%% - bar at monitored_rte_debug.c:102 -%%% - main at monitored_rte_debug.c:119 -%%% /***************************************/ -%%% Aborted -%%% \end{shell} -% -%%% A run of an executable linked against an unoptimized RTL shows a stack trace on -%%% the instrumented program saved to \T{monitored\_rte\_debug.c}. In the example -%%% above calls to \T{exec\_abort} and \T{runtime\_assert} belong to the \eacsl -%%% RTL, whereas calls to \T{foo}, \T{bar} and \T{main} belong to the input -%%% program. -% -%%% It should be noted that because debug versions of \eacsl RTL are compiled -%%% without compile-time optimizations and execute additional assertions the -%%% runtime overheads of the instrumented programs linked against the debug -%%% versions of \eacsl RTL can be up to 10 times greater than those linked against -%%% the production versions. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -%\subsection{Runtime Monitor Behavior} -% -%When a predicate is checked at runtime function -%\texttt{\_\_e\_acsl\_assert}\codeidxdef{e\_acsl\_assert} is called. By -%default this function does nothing if the predicate is valid, and prints -%an error message and aborts the execution (by raising an \texttt{ABORT} -%signal via a call to the \C function \texttt{abort}) if the predicate is invalid. -% -%The default behavior of \texttt{\_\_e\_acsl\_assert}\codeidxdef{e\_acsl\_assert} -%can be altered using \longopt{fail-with-code=CODE} option of \eacslgcc -%that uses \texttt{exit(CODE)} instead of \texttt{abort}. -% -%For instance, the program generated using the following command -%exits with code \texttt{5} on a predicate violation. -%\begin{shell} -% \$ e-acsl-gcc.sh -c --fail-with-code=5 foo.c -%\end{shell} -% -%Forceful termination of a monitored program can be disabled using -%\longopt{keep-going} option. If specified, \eacslgcc generates code that -%reports each property violated at runtime but allows the monitored program to -%continue execution. \longopt{keep-going} should be used with caution: -%such unterminated executions may lead to incorrect verification results. -% -%\eacslgcc also provides a way to use an alternative definition of -%\texttt{\_\_e\_acsl\_assert}\codeidxdef{e\_acsl\_assert}. -% -%Custom implementation of \texttt{\_\_e\_acsl\_assert}\codeidxdef{e\_acsl\_assert} -%should be provided in a separate \C (or object file) and respect the -%following signature: -% -%\cinput{examples/assert_sign.c} -% -%Below is an example which prints the validity status of each property but never -%exits. -% -%\listingname{my\_assert.c} -%\cinput{examples/my_assert.c} -% -%A monitored program that uses custom definition of -%\texttt{\_\_e\_acsl\_assert}\codeidxdef{e\_acsl\_assert} can then be generated -%as follows using \longopt{external-assert} option of \eacslgcc to replace the -%default implementation of \texttt{\_\_e\_acsl\_assert}\codeidxdef{e\_acsl\_assert} -%with the customized one specified in \texttt{my\_assert.c} -% -%\begin{shell} -%\$ e-acsl-gcc.sh -c first.i my_assert.c -Ofirst --external-assert=my\_assert.c -%\$ ./first.e-acsl -%Assertion at line 3 in function main is valid. -%The verified predicate was: `x == 0'. -%Assertion at line 4 in function main is invalid. -%The verified predicate was: `x == 1'. -%\end{shell} -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%>>> -%\section{Additional Verifications} % <<< -% -%In addition to checking annotations at runtime, \eacsl is able to detect a few -%errors at runtime. -% -%\subsection{Format String Vulnerabilities} -%\index{Format String Vulnerabilities} -% -%Whenever option \longopt{validate-format-strings} of \eacslgcc is set, \eacsl -%detects format-string vulnerabilities in \texttt{printf}-like function. Below is -%an example which incorrectly tries to print a string through a \texttt{\%d} -%format. -% -%\listingname{printf.c} -%\cinput{examples/printf.c} -% -%This error can be detected by \eacsl as follows. -% -%\begin{shell} -%\$ e-acsl-gcc.sh -Oprintf -c --validate-format-strings printf.c -%\$ ./printf.e-acsl -%printf: directive 1 ('%d') expects argument of type 'int' but the corresponding -%argument has type 'char*' -%Abort -%\end{shell} -% -%\subsection{Incorrect Usage of Libc Functions} -%\index{Libc} -% -%Whenever option \longopt{libc-replacements} of \eacslgcc is set, \eacsl is -%able to detect incorrect usage of the following libc functions: -%\begin{itemize} -%\item \texttt{memcmp} -%\item \texttt{memcpy} -%\item \texttt{memmove} -%\item \texttt{memset} -%\item \texttt{strcat} -%\item \texttt{strncat} -%\item \texttt{strcmp} -%\item \texttt{strcpy} -%\item \texttt{strncpy} -%\item \texttt{strlen} -%\end{itemize} -% -%For instance, the program below incorrectly uses \texttt{strcpy} because the -%destination string should contain at least 7 \texttt{char} (and not only 6). -% -%\listingname{strcpy.c} -%\cinput{examples/strcpy.c} -% -%This error can be reported by \eacsl as follows. -% -%\begin{shell} -%\$ e-acsl-gcc.sh -Ostrcpy -c --libc-replacements strcpy.c -%\$ ./strcpy.e-acsl -%strlen: insufficient space in destination string, at least 7 bytes required -%Abort -%\end{shell} -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%>>> -%\section{Incomplete Programs} % <<< -%\label{sec:incomplete} -% -%Executing a \C program requires to have a complete application. However, the -%\eacsl plug-in does not necessarily require to have it to generate the -%instrumented code. It is still possible to instrument a partial program with no -%entry point or in which some functions remain undefined. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -%\subsection{Programs without Main} -%\label{sec:no-main} -%\index{Program!Without main} -% -%The \eacsl plug-in can instrument a program without the \T{main} function. -%Consider for instance the following annotated program without \texttt{main}. -% -%\listingname{no\_main.i} -%\cinput{examples/no_main.i} -% -%The instrumented code is generated as usual, even though you get an additional -%warning. -%\begin{shell} -%\$ e-acsl-gcc.sh no_main.i -omonitored_no_main.i -%<skip preprocessing command line> -%[e-acsl] beginning translation. -%[e-acsl] warning: cannot find entry point `main'. -% Please use option `-main' for specifying a valid entry point. -% The generated program may miss memory instrumentation. -% if there are memory-related annotations. -%[e-acsl] translation done in project "e-acsl". -%\end{shell} -% -%This warning indicates that the instrumentation could be incorrect if the -%program contains memory-related annotations (see -%Section~\ref{sec:limits:no-main}). That is not the case here, so you can -%safely ignore it. Now, it is possible to compile the generated code with a \C -%compiler in a standard way, and even link it against additional files such as -%the following: -% -%\listingname{main.c} -%\cinput{examples/main.c} -% -%\begin{shell} -%\$ e-acsl-gcc.sh -C monitored_no_main.i main.c -Owith_main -%\$ ./with_main.e-acsl -%x = 65536 -%\end{shell} -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%\ -%\subsection{Undefined Functions} -%\label{sec:no-code} -%\index{Function!Undefined} -% -%Similar to the above section \eacsl is capable of instrumenting a program which -%has definitions of arbitrary functions missing. Consider for instance the -%following annotated program for which the implementation of the function -%\texttt{my\_pow} is not provided. -% -%\listingname{no\_code.c} -%\cinput{examples/no_code.c} -% -%The instrumented code is generated as usual, even though you get an additional -%warning. -%\begin{shell} -%\$ e-acsl-gcc.sh -omonitored_no_code.i no_code.c -%[e-acsl] beginning translation. -%[e-acsl] warning: annotating undefined function `my_pow': -% the generated program may miss memory instrumentation -% if there are memory-related annotations. -%no_code.c:9:[kernel] warning: No code nor implicit assigns clause for function my_pow, -%generating default assigns from the prototype -%[e-acsl] translation done in project "e-acsl". -%\end{shell} -% -%Similar to the previous Section the warning indicates that the instrumentation -%could be incorrect if the program contains memory-related annotations (see -%Section~\ref{sec:limits:no-code}). That is still not the case here, so you can -%safely ignore it. -% -%\listingname{pow.i} -%\cinput{examples/pow.i} -% -%Similar to the example shown in the previous section you can generate the -%executable by providing definition of \T{my\_pow}. -% -%\begin{shell} -%\$ e-acsl-gcc.sh -C -Ono_code pow.i monitored_no_code.i -%\$ ./no_code.e-acsl -%Postcondition failed at line 5 in function my_pow. -%The failing predicate is: -%\old(n % 2 == 0) ==> \result >= 1. -%Aborted -%\end{shell} -% -%The execution of the corresponding binary fails at runtime: actually, our -%implementation of the function \texttt{my\_pow} overflows in case of large -%exponentiations. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%>>> -%\section{Combining E-ACSL with Other Plug-ins} % <<< -%\label{sec:combine} -% -%As the \eacsl plug-in generates a new \framac project, it is easy to run any -%plug-in on the generated program, either in the same \framac session (thanks to -%the option \shortopt{then} or through the GUI), or in another one. The only -%issue might be that, depending on the plug-in, the analysis may be imperfect if -%the generated program uses \gmp or the dedicated memory library: both make -%intensive use of dynamic structures which are usually difficult to handle by -%analysis tools. -% -%Another way to combine \eacsl with other plug-ins is to run \eacsl last. For -%instance, the \rte plug-in~\cite{rte} may be used to generate annotations -%corresponding to runtime errors. Then the \eacsl plug-in may generate an -%instrumented program to verify that there are no such runtime errors during the -%execution of the program. -% -%Consider the following program. -% -%\listingname{combine.i} -%\cinput{examples/combine.i} -%To check at runtime that this program does not perform any runtime error (which -%are potential overflows in this case), just do: -% -%\begin{shell} -%\$ frama-c -rte combine.i -then -e-acsl -then-last -print -% -ocode monitored_combine.i -%\$ e-acsl-gcc.sh -C -Ocombine monitored_combine.i -%\$ ./combine. -%\end{shell} -% -%Automatic assertion generation can also be enabled via \eacslgcc directly via -%the following command: -%\begin{shell} -%\$ e-acsl-gcc.sh -c -Ocombine -omonitored_combine.i --rte=all -%\end{shell} -% -%Note the use of \eacslgcc \longopt{rte} option which asks \framac to combine -%\eacsl with \rte plug-in. This plug-in automatically instruments the input -%program with runtime -%assertions before running the \eacsl plug-in on it. \longopt{rte} option of -%\eacslgcc also accepts a comma-separated list of arguments indicating the types -%of assertions to generate. Consult the \eacslgcc man page for a detailed -%list of arguments accepted by \longopt{rte}. -% -%The present implementation of \eacslgcc does not fully support combining \eacsl -%with other \framac plug-ins. However it is still possible to instrument programs -%with \eacsl directly and use \eacslgcc to compile the generated code. -% -%If you run the \eacsl plug-in after another one, it first generates -%a new temporary project in which it links the analyzed program against its own -%library in order to generate the \framac internal representation of the \C -%program (\emph{aka} AST), as explained in Section~\ref{sec:run}. Consequently, -%even if the \eacsl plug-in keeps the maximum amount of information, the results -%of already executed analyzers (such as validity status of annotations) are not -%known in this new project. If you want to keep them, you have to set the option -%\shortopt{e-acsl-prepare} when the first analysis is asked for. -% -%In this context, the \eacsl plug-in does not generate code for annotations -%proven valid by another plug-in, except if you explicitly set the option -%\shortopt{e-acsl-valid}. For instance, \Eva~\cite{eva} is able to -%prove that there is no potential overflow in the previous program, so the \eacsl -%plug-in does not generate additional code for checking them if you run the -%following command. -%\begin{shell} -%\$ frama-c -e-acsl-prepare -rte combine.i -then -val -then -e-acsl \ -% -then-last -print -ocode monitored_combine.i -%\end{shell} -%The additional code will be generated with one of the two following commands. -%\begin{shell} -%\$ frama-c -e-acsl-prepare -rte combine.i -then -val -then -e-acsl \ -% -e-acsl-valid -then-last -print -ocode monitored_combine.i -%\$ frama-c -rte combine.i -then -val -then -e-acsl \ -% -then-last -print -ocode monitored_combine.i -%\end{shell} -%In the first case, that is because it is explicitly required by the option -%\shortopt{e-acsl-valid} while, in the second case, that is because the option -%\shortopt{e-acsl-prepare} is not provided on the command line which results in -%the fact that the result of the value analysis are unknown when the \eacsl -%plug-in is running. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%>>> -%\section{Customization} % <<< -%\label{sec:custom} -% -%There are several ways to customize the \eacsl plug-in. -% -%First, the name of the generated project -- which is \texttt{e-acsl} by default -%-- may be changed by setting the option \shortopt{e-acsl-project} option of the -%\eacsl plug-in. While there is not direct support for this option in \eacslgcc -%you can pass it using \shortopt{F} switch: -% -%\listingname{check.i} -%\cinput{examples/check.i} -% -%\begin{shell} -%\$ e-acsl-gcc.sh -F"-e-acsl-project foobar" check.i -%<skip preprocessing commands> -%[e-acsl] beginning translation. -%check.i:4:[e-acsl] warning: E-ACSL construct `right shift' is not yet supported. -% Ignoring annotation. -%[e-acsl] translation done in project "foobar". -%\end{shell} -% -%Further, the \eacsl plug-in option \shortopt{e-acsl-check} does not generate a -%new project but verifies that each annotation is translatable. Then it produces -%a summary as shown in the following example (left or right shifts in annotations -%are not yet supported by the \eacsl plug-in~\cite{eacsl-implem}). -% -%\begin{shell} -%\$ frama-c -e-acsl-check check.i -%<skip preprocessing commands> -%check.i:4:[e-acsl] warning: E-ACSL construct `left/right shift' is not yet supported. -% Ignoring annotation. -%[e-acsl] 0 annotation was ignored, being untypable. -%[e-acsl] 1 annotation was ignored, being unsupported. -%\end{shell} -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%>>> -%\section{Verbosity Policy} % <<< -%\label{sec:verbose} -% -%By default, \eacsl does not provide much information when it is running. Mainly, -%it prints a message when it begins the translation, and another one when the -%translation is done. It may also display warnings when something requires the -%attention of the user, for instance if it is not able to translate an -%annotation. Such information is usually enough but, in some cases, you might -%want to get additional control on what is displayed. As quite usual with -%\framac plug-ins, \eacsl offers two different ways to do this: the verbosity -%level which indicates the \emph{amount} of information to display, and the -%message categories which indicate the \emph{kind} of information to display. -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -%\subsection{Verbosity Level} -% -%The amount of information displayed by the \eacsl plug-in is settable by the -%option \shortopt{e-acsl-verbose}. It is 1 by default. Below is indicated -%which information is displayed according to the verbosing level. The level $n$ -%also displays the information of all the lower levels. -% -%\begin{center} -%\begin{tabular}{|l|l|} -%\hline -%\shortopt{e-acsl-verbose 0}& only warnings and errors\\ -%\hline -%\shortopt{e-acsl-verbose 1}& beginning and ending of the translation\\ -%\hline -%\shortopt{e-acsl-verbose 2}& different parts of the translation and -% functions-related information\\ -%\hline -%\shortopt{e-acsl-verbose 3}& predicates- and statements-related information\\ -%\hline -%\shortopt{e-acsl-verbose 4}& terms- and expressions-related information -%\\ -%\hline -%\end{tabular} -%\end{center} -% -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% -% -%\subsection{Message Categories} -% -%The kind of information to display is settable by the option -%\shortopt{e-acsl-msg-key} (and unsettable by the option -%\shortopt{e-acsl-msg-key-unset}). The different keys refer to different -%parts of the translation, as detailed below. -% -%\begin{center} -%\begin{tabular}{|l|l|} -%\hline -%analysis & minimization of the instrumentation for memory-related annotation -%(section~\ref{sec:memory}) \\ -%\hline -%duplication & duplication of functions with contracts -%(section~\ref{sec:no-code}) \\ -%\hline -%translation & translation of an annotation into \C code\\ -%\hline -%typing & minimization of the instrumentation for integers -%(section~\ref{sec:integers}) \\ -%\hline -%\end{tabular} -%\end{center} -% >>> diff --git a/doc/userman/eacslversion.tex b/doc/userman/eacslversion.tex deleted file mode 100644 index 921e0d99cef3b2c72913f553471f5ec89617206e..0000000000000000000000000000000000000000 --- a/doc/userman/eacslversion.tex +++ /dev/null @@ -1,2 +0,0 @@ -\newcommand{\eacslversion}{Chlorine-20180501+dev\xspace} -\newcommand{\fcversion}{Chlorine\xspace} diff --git a/doc/userman/fclangversion.tex b/doc/userman/fclangversion.tex index b542d4126cab709a71bfa1934a8ecb8c0c3fa335..67275e01473cf88945e1ea40a0ff787bfb06c503 100644 --- a/doc/userman/fclangversion.tex +++ b/doc/userman/fclangversion.tex @@ -1,2 +1,2 @@ \newcommand{\fclangversion}{Chlorine+\xspace} -\newcommand{\fcversion}{Chlorine+\xspace} +\newcommand{\fcversion}{Potassium+\xspace} diff --git a/doc/userman/introduction.tex b/doc/userman/introduction.tex index 809a738bcb14a39382a74bf00cb90a1166a51777..337d6f3cc427aadd75397f5ccb55357ee4594e94 100644 --- a/doc/userman/introduction.tex +++ b/doc/userman/introduction.tex @@ -8,6 +8,22 @@ The \fclang plug-in supports the \acslpp extension of \acsl for C++ programs and it is built on the Clang\footnote{https://clang.llvm.org/} compiler infrastructure and uses Clang for parsing C++. +The \fclang plug-in intends to provide a full translation of C++ and \acslpp into the \framac internal representation, and from there be able to be analyzed by other \framac plug-ins. +This is a work in progress. +The following sections describe the limitations of the current implementation. +\begin{itemize} + \item The plug-in aims for the C++11 version of C++ + \item \acslpp is described in the companion \acslpp reference manual, also a part of the \framac release. +\end{itemize} + + +Notes +\begin{itemize} + \item Which version of Clang? + \item Clang (8) support C++98 (except exported templates, later removed) and C++11 and current draft standard for C++1y + \item see https://clang.llvm.org/docs/UsersManual.html\#cxx for supported features in clang C++ +\end{itemize} + %The \fclang version you are using is indicated by the %command \texttt{frama-c -e-acsl-version}\optionidx{-}{e-acsl-version}. \fclang %automatically translates an annotated C program into another program that fails diff --git a/doc/userman/limitations.tex b/doc/userman/limitations.tex index 29097433073d54da1faec70044a77d65c5f2aea8..4e267f0956d3eb3187a540c190bfe9b8854c0201 100644 --- a/doc/userman/limitations.tex +++ b/doc/userman/limitations.tex @@ -1,188 +1,3 @@ \chapter{Known Limitations} The development of the \fclang plug-in is still ongoing. -%First, the \eacsl -%reference manual~\cite{eacsl} is not yet fully supported. Which annotations can -%already be translated into \C code and which cannot is defined in a separate -%document~\cite{eacsl-implem}. Second, even though we do our best to avoid them, -%bugs may exist. If you find a new one, please report it on the bug tracking -%system\footnote{\url{http://bts.frama-c.com}} (see Chapter 10 of the \framac -%User Manual~\cite{userman}). Third, there -%are some additional known limitations, which could be annoying for the user in -%some cases, but are tedious to lift. Please contact us if you are interested in -%lifting these limitations\footnote{Read \url{http://frama-c.com/support.html} -% for additional details.}. - -%\section{Uninitialized Values} -%\index{Uninitialized value} -% -%As explained in Section~\ref{sec:runtime-error}, the \eacsl plug-in should never -%translate an annotation into \C code which can lead to a runtime error. This is -%enforced, except for uninitialized values which are values read before having -%been written. -% -%\listingname{uninitialized.i} -%\cinput{examples/uninitialized.i} -% -%If you generate the instrumented code, compile it, and finally execute it, you -%may get no runtime error depending on your \C compiler, but the behavior is -%actually undefined because the assertion reads the uninitialized variable -%\lstinline|x|. You should be caught by the \eacsl plug-in, but that is not -%the case yet. -%\begin{shell} -% -%\$ e-acsl-gcc.sh uninitialized.i -c -Omonitored_uninitialized -%monitored_uninitialized.i: In function 'main': -%monitored_uninitialized.i:44:16: warning: 'x' is used uninitialized in this function -%[-Wuninitialized] -%\$ ./monitored_uninitialized.e-acsl -%\end{shell} -% -%This is more a design choice than a limitation: should the \eacsl plug-in -%generate additional instrumentation to prevent such values from being evaluated, -%the generated code would be much more verbose and slower. -% -%If you really want to track such uninitializations in your annotation, you have -%to manually add calls to the \eacsl predicate -%\lstinline|\initialized|~\cite{eacsl}. -% -%\section{Incomplete Programs} -% -%Section~\ref{sec:incomplete} explains how the \eacsl plug-in is able to handle -%incomplete programs, which are either programs without main, or programs -%containing undefined functions (\emph{i.e.} functions without body). -% -%However, if such programs contain memory-related annotations, the generated code -%may be incorrect. That is made explicit by a warning displayed when the \eacsl -%plug-in is running (see examples of Sections~\ref{sec:no-main} and -%\ref{sec:no-code}). -% -%\subsection{Programs without Main} -%\index{Program!Without main} -%\label{sec:limits:no-main} -% -%The instrumentation in the generated program is partial for every program -%without main containing memory-related annotations, except if the option -%\optionuse{-}{e-acsl-full-mmodel} or the \eacsl plug-in (of \shortopt{M} option -%of \eacslgcc) is provided. In that case, violations of such annotations are -%undetected. -% -%Consider the following example. -% -%\listingname{valid\_no\_main.c} -%\cinput{examples/valid_no_main.c} -% -%You can generate the instrumented program as follows. -%\begin{shell} -%\$ e-acsl-gcc.sh -ML -omonitored_valid_no_main.i valid_no_main.c -%<skip preprocessing commands> -%[e-acsl] beginning translation. -%<skip warnings about annotations from the Frama-C libc -% which cannot be translated> -%[kernel] warning: no entry point specified: -% you must call function `__e_acsl_memory_init' by yourself. -%[e-acsl] translation done in project "e-acsl". -%\end{shell} -% -%The last warning states an important point: if this program is linked against -%another file containing \texttt{main} function, then this main function must -%be modified to insert a call to the function \texttt{\_\_e\_acsl\_memory\_init} -%\index{e\_acsl\_memory\_init@\texttt{\_\_e\_acsl\_memory\_init}} at the very -%beginning. This function plays a very important role: it initializes metadata -%storage used for tracking of memory blocks. Unless this call is inserted the -%run of a modified program is likely to fail. -% -%While it is possible to add such intrumentation manually we recommend using -%\eacslgcc. Consider the following incomplete program containing \T{main}: -% -%\listingname{modified\_main.c} -%\cinput{examples/modified_main.c} -% -%Then just compile and run it as explained in Section~\ref{sec:memory}. -% -%\begin{shell} -%\$ e-acsl-gcc.sh -M -omonitored_modified_main.i modified_main.c -%\$ e-acsl-gcc.sh -C -Ovalid_no_main monitored_modified_main.i monitored_valid_no_main.i -%\$ ./valid_no_main.e-acsl -%Assertion failed at line 11 in function f. -%The failing predicate is: -%freed: \valid(x). -%Aborted -%\end{shell} -% -%Also, if the unprovided main initializes some variables, running the -%instrumented code (linked against this main) could print some warnings from the -%\eacsl memory library\footnote{see -% \url{https://bts.frama-c.com/view.php?id=1696} for an example}. -% -%\subsection{Undefined Functions} -%\label{sec:limits:no-code} -%\index{Function!Undefined} -% -%The instrumentation in the generated program is partial for a program $p$ if and -%only if $p$ contains a memory-related annotation $a$ and an undefined function -%$f$ such that: -%\begin{itemize} -%\item either $f$ has an (even indirect) effect on a left-value occurring in $a$; -%\item or $a$ is one of the post-conditions of $f$. -%\end{itemize} -%A violation of such an annotation $a$ is undetected. There is no workaround yet. -% -%Also, the option \optionuse{-}{e-acsl-check} does not verify the annotations of -%undefined functions. There is also no workaround yet. -% -%\section{Recursive Function} -%\index{Function!Recursive} -% -%Programs containing recursive functions have the same limitations as the ones -%containing undefined functions (Section~\ref{sec:limits:no-code}) and -%memory-related annotations. -% -%%% JS: this issue should have been fixed: -%%% Also, even though there is no such annotations, the generated code may call a -%%% function before it is declared. When this behavior appears remains -%%% unspecifed. The generated code is however easy to fix by hand. -% -%\section{Variadic Functions} -%\index{Function!Variadic} -% -%Programs containing undefined variadic functions with contracts -%are not yet supported. Using the \variadic plug-in of \framac could be a -%solution in some cases, but its generated code cannot always be compiled. -% -%\section{Function Pointers} -%\index{Function!Pointer} -% -%Programs containing function pointers have the same limitations on -%memory-related annotations as the ones containing undefined or recursive -%functions. -% -%\section{Requirements to Input Programs} -%\index{Function!Input} -% -%\subsection{\eacsl Namespace} -%While \eacsl uses source-to-source transformations and not binary -%instrumentations it is important that the source code provided at input does -%not contain any variables or functions prefixed \T{\_\_e\_acsl\_}. \eacsl -%reserves this namespace for its transformations, and therefore an input program -%containing such symbols beforehand may fail to be instrumented or compiled. -% -%\subsection{Memory Management Functions} -%Programs providing custom definitions of \T{syscall}, \T{mmap} or -%\T{sbrk} should be rejected. Also, an input programs should not modify -%memory-management functions namely \T{malloc}, \T{calloc}, \T{realloc}, -%\T{free}, \T{cfree}, \T{posix\_memalign} and \T{aligned\_alloc}. \eacsl relies -%on these functions in order to track heap memory. Further, correct heap memory -%monitoring requires to limit allocation and deallocation of heap memory to -%POSIX-compliant memory management functions -%listed above. Monitoring of programs that allocate memory using non-standard or -%obsolete functions (e.g., \T{valloc}, \T{memalign}, \T{pvalloc}) may not work -%correctly. -% -%\section{Limitations of E-ACSL Monitoring Libraries} -%\index{Function!Libraries} -% -%\eacsl monitoring library implementing segment-based memory model is fairly -%recent and presently has very limited support for 32-bit architectures. When -%using a 32-bit machine we recommend using the bittree-based memory model -%instead. diff --git a/doc/userman/macros.tex b/doc/userman/macros.tex index e92cba83f6bc8f418a26e41906c122a482de4a20..7140fbbfb250b6776b15563ab0b28836efc8f51a 100644 --- a/doc/userman/macros.tex +++ b/doc/userman/macros.tex @@ -44,6 +44,10 @@ \newcommand{\changeinsection}[2]{\textbf{Section \ref{sec:#1}:} #2.} \newcommand{\todo}[1]{{\large \textbf{TODO: #1.}}} +\newcommand{\option}[1]{\textbf{#1}} +\newcommand{\irg}{\lstinline|framaCIRGen|i\xspace} +\newcommand{\fcl}{\lstinline|frama-clang|\xspace} +\newcommand{\fc}{\lstinline|frama-c|\xspace} \newcommand{\markdiff}[1]{{\color{blue}{#1}}} \newenvironment{markdiffenv}[1][]{% diff --git a/doc/userman/main.tex b/doc/userman/main.tex index ece6fd6bf82296401207304a97fad3a587876554..6ecb4f66c6e78919c5218e3dc127ba180e5b40a4 100644 --- a/doc/userman/main.tex +++ b/doc/userman/main.tex @@ -64,12 +64,23 @@ evolve in the future. We gratefully thank the people who contributed to this document: David R. Cok, Virgile Prevosto, Armand Pucetti, Franck Verdrine. -TBD-VESSEDIA +This work is done in the context of project VESSEDIA, +which received funding from the European Union's 2020 +Research and Innovation Program under grant agreement +No. 731453. + %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \include{introduction} \include{description} +\include{limitations} + +\include{preprocessing} + +\section{\acslpp constructs} + +\textit{This section is not yet written} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% diff --git a/doc/userman/preprocessing.tex b/doc/userman/preprocessing.tex new file mode 100644 index 0000000000000000000000000000000000000000..eee212c42b00eb9bde4a7d651bcc3ef279926001 --- /dev/null +++ b/doc/userman/preprocessing.tex @@ -0,0 +1,93 @@ +\section{Preprocessing} + +As a refresher, the C/C++ preprocessor (CPP) (cf. \url{https://gcc.gnu.org/onlinedocs/cpp/}) conceptually implements the following operations on a C++ source file: +\begin{itemize} + \item The input is read and broken it into a sequence of physical lines according to the line terminators (ASCII character sequences \\r, \\n, or \\r\\n). + \item Each C trigraph is replaced by its corresponding character. + \item Any backslash-whitespace-line-terminator sequence is removed and the line that it ends is combined with the following line. + \item Comments are replaced by single spaces. This requires tokenizing the input to avoid recognizing comment markers within strings as indicating a comment. Note that this allows block comments to hide line terminations. + \item The input text is divided into preprocessing tokens grouped in logical lines. Each preprocessor token becomes a compiler token (except where \#\# concatenation occurs). However, \acslb tokens are slightly different, as described below. + \item The source text is transformed according to any preprocessing directives contained within it. Each preprocessing directive must be contained within one logical line + +\end{itemize} +The result is a sequence of preprocessing tokens that is passed on to the +remaining compiler phases. + +\subsection{Trigraphs} + +TBD - are these supported by clang? + +\subsection{Preprocessor tokens} +Preprocessor tokens (per CPP) belong to one of five categories. White space (space, tab, TBD) serves only to separate tokens; it is not needed between tokens whose concatenation is not a single token. Line terminators also separate tokens and also delineate certain features: preprocessing directives and string literals do not extend over more than one (logical) line. +\begin{itemize} + \item identifiers: character sequences that match the regular expression \texttt{[a-zA-Z\_][a-zA-Z\_0-9]*} . Dollar signs are allowed as non-digit identifier characters if the clang option \texttt{-fdollars-in-identifiers} is used. (TBD - on by default?) + \item number: character sequences that match the regular expression \texttt{[.]?[0-9]([0-9a-zA-Z.]|[eEpP][+-]))*} . (TBD - dollar signs also allowed?) + \item string literals: character sequence enclosed in " " or ' ' or < >, with \textbackslash " for " in a double-quoted literal (that is not a header file name) and \' for ' in a single-quoted literal. + \item punctuator: all single non-alphanumeric printable characters except \@, \# and `, and all multi-punctuation sequences that meaningful to C/C++ (e.g. >>>= ), but not an arbitrary multi-punctuation character sequence. + \\item other tokens: \@, \#, ` and all non-ASCII single characters. +\end{itemize} +TBD - unicode characters +Note that not all preprocessor tokens are valid C/C++ parser tokens. Tokens in the other category have no meaning to C/C++ and the \texttt{number} category allows many sequences that are not legal C/C++ numeric tokens. These tokens will generally provoke compiler errors. For example in C/C++, 0..2 is one token and is not interpreted as two consecutive numeric tokens. + +\acsl and \acslpp have slightly different tokens than the above, so the preprocessor tokens need to be re-tokenized in some cases: +\begin{itemize} + \item The \@ token is a whitespace character in \acslb. + \item There are some \acslb multi-character punctuator tokens that are not + preprocessor tokens: + \begin{itemize} + \item[] $==>$ (logical implies) + \item[] $-->$ (bit-wise implies) + \item[] $<==>$ (logical equality) + \item[] $<-->$ (bit-wise equality) + \item[] TBD any more? + \end{itemize} + These \acslb tokens need to be assembled from multiple CPP tokens (and those CPP tokens must not be separated by white space) + \item A CPP numeric token that contains .. will not be a legal C/C++ number, but may be multiple legal \acslb tokens with the .. representing the range operator. For example, the single CPP token \texttt{0..last} is retokenized for \acslb as if it were written \texttt{0 .. last} . + \item \acslb allows certain built-in non-ASCII symbols, namely + \begin{itemize} + \item[] $\forall$ (unicode 0x2200) - universal quantifier + \item[] $\exists$ (unicode 0x2203) - existential quantifier +% \item[] $\equal$ (unicode 0x2261) - equality +% \item[] $\notequal$ (unicode 0x2262) - inequality + \item[] $\leq$ (unicode 0x2264) - less than or equal + \item[] $\geq$ (unicode 0x2265) - greater than or equal +% \item[] $\minus$ (unicode 0x2212) - minus +% \item[] $\implies$ (unicode 0x21D2) - implies +% \item[] $\equiv$ (unicode 0x21D4) - equivalence + \item[] $\bigwedge$ (unicode 0x2227) - logical and + \item[] $\bigvee$ (unicode 0x2228) - logical or + \item[] $\neg$ (unicode 0x00AC) - logical negation +% \item[] $\logicalxor$ (unicode 0x22BB) - logical inequivalence + \item[] $\subset$ (unicode 0x2208) - subset +% \item[] $\Boolean$ (unicode 0x1D539) - set of booleans +% \item[] $\Integer$ (unicode 0x2124) - set of integers +% \item[] $\Real$ (unicode 0x211D) - set of reals + \item TBD - more - ensure unicode equivalent + \end{itemize} +\end{itemize} + +\subsection{Preprocessor directives} +A preprocessing directive consists of a line (after the previous preprocessing phaseshave been completed) that begins with optional white space, the '\#' character, additional optional white space, and a preprocessor directive identifier. +The preprocessing language contains a fixed set of preprocessing directive identifiers: +\begin{itemize} + \item \texttt{define}, \texttt{undef} + \item \texttt{if}, \texttt{ifdef}, \texttt{ifndef}, \texttt{elif}, \texttt{else}, \texttt{endif} + \item \texttt{warning}, \texttt{error} + \item \texttt{include} + \item \texttt{line} + \item \texttt{pragma} +\end{itemize} +In addition, identifiers that have been defined (by a \#define directive) as macros are expanded according to the macro expansion rules (not described here). + +Now \acslb annotations are contained in C/C++ comments. +Consequently, any directives contained in the annotation are not seen when the source file is processed simply as a C/C++ source file. However, the effect of some directives lasts until the end of the source file. +Accordingly, \acslpp imposes constraints on the directives that may be present within annotations: +\begin{itemize} + \item \texttt{define} and \texttt{undef} are not permitted in an annotation + \item \texttt{if}, \texttt{ifdef}, \texttt{ifndef}, \texttt{elif}, \texttt{else}, \texttt{endif} are permitted but must be completely nested within the annotation in which they appear (an \#endif and its corresponding \#if/\#ifdef/\#ifndef must be in the same annotation comment.) + \item \texttt{warning} and \texttt{error} are permitted + \item \texttt{include} is permitted, but will cause errors if it contains, as is almost always the case, either \texttt{define} or \texttt{pragma} directives + \item \texttt{line} - TBD + \item \texttt{pragma} is not permitted +\end{itemize} +