From ebc967bc9391061c1921956ba5d2cb6fc737e9d6 Mon Sep 17 00:00:00 2001 From: davidcok <davidcok@github.com> Date: Thu, 1 Aug 2019 22:21:19 +0200 Subject: [PATCH] Edits from readover --- doc/userman/description.tex | 155 ++++++++++++++++++---------------- doc/userman/fclangversion.tex | 5 +- doc/userman/grammar.tex | 17 ++-- doc/userman/introduction.tex | 25 +++--- doc/userman/limitations.tex | 11 ++- doc/userman/macros.tex | 11 ++- doc/userman/main.tex | 59 +++++++------ doc/userman/preprocessing.tex | 80 +++++++++++------- 8 files changed, 203 insertions(+), 160 deletions(-) diff --git a/doc/userman/description.tex b/doc/userman/description.tex index 3403acba..e63d1616 100644 --- a/doc/userman/description.tex +++ b/doc/userman/description.tex @@ -8,22 +8,26 @@ The instructions for doing so are provided at \fclang depends on two software packages: \begin{itemize} -\item A current version of \framac itself. It is highly recommended to install \framac using \lstinline|opam|, as described in the installation procedures for \framac (\url{https://frama-c.com/download.html}) +\item A current version of \framac itself. It is highly recommended to install \framac using \lstinline|opam|, as described in the installation procedures for \framac (\url{https://frama-c.com/download.html}). +Version \fclangversion of \fclang is compatible with version \fcversion of \framac. \item An installation of Clang, which is available as part of LLVM, which itself is available from \url{http://releases.llvm.org}. +Version \fclangversion of \fclang is compatible with version \clangversion of \clang. \end{itemize} Building and installing \fclang has two effects: \begin{itemize} -\item The \fclang executable files are installed within the \framac installation. In particular, if \framac has been installed using \lstinline|opam|, then \irg will be installed in the -\lstinline|opam| \lstinline|bin| directory. You must be sure that this directory is on your system \verb|$PATH| (try \lstinline|which framaCIRGen| after installation to be sure). -\item Include files containing \acslpp specifications of C++ library functions +\item The \fclang executable files are installed within the \framac installation. +In particular, if \framac has been installed using \lstinline|opam|, then the principal executable \irg will be installed in the +\lstinline|opam| \lstinline|bin| directory. +You must be sure that this directory is on your system \verb|$PATH| (try \lstinline|which framaCIRGen| after installation to be sure). +\item Include files containing \acslpp specifications of \cpp library functions are copied to \verb|$FRAMAC_SHARE/libc| and \verb|$FRAMAC_SHARE/frama-clang/libc++|, where \verb|$FRAMAC_SHARE| is the path given by the command \lstinline|frama-c-config -print-share-path|. \end{itemize} These include files are replacements for the standard system include files. -They should have the same definitions of C and C++ functions and classes, but +They should have the same definitions of \C and \cpp functions and classes, but with \acslpp annotations giving their specifications. The plugin can be built by hand from source using the following commands. Create a new directory to which you download and unpack the source distribution. Then \lstinline|cd| into the source directory itself (one level down) and execute: @@ -34,52 +38,53 @@ autoconf -f make make install \end{listing-nonumber} -\lstinline|<opamroot>| should be the absolute path of the root of the opam installation, e.g. something like \lstinline|~/.opam/4.05.0| or \lstinline|~/.opam/system|. If \framac has been installed independently of \lstinline|opam|, then set \lstinline|<opamroot>| above so that the location of the \framac executable (that is, the result of executing \lstinline|which frama-c|) is +\lstinline|<opamroot>| should be the absolute path of the root of the opam installation, e.g., something like \lstinline|~/.opam/4.05.0| or \lstinline|~/.opam/system|. If \framac has been installed independently of \lstinline|opam|, then set \lstinline|<opamroot>| above so that the location of the \framac executable (that is, the result of executing \lstinline|which frama-c|) is \lstinline|<opamroot>/bin/frama-c|. \chapter{Running the plug-in} -\section{C++ files} -Once installed the plugin is run automatically by Frama-C on any C++ files listed on the command-line. C++ files are identified by their filename suffixes. The default suffixes recognized as C++ are these: +\section{\cpp files} +Once installed the plugin is run automatically by \framac on any \cpp files listed on the command-line. \cpp files are identified by their filename suffixes. The default suffixes recognized as \cpp are these: \lstinline| .cpp, .C, .cxx, .ii, .ixx, .ipp, .i++, .inl, .h, .hh| Currently this set of suffixes is compiled in the plugin (in file \texttt{frama\_Clang\_register.ml}) and can only be changed by recompiling and reinstalling the plugin. \section{Frama-clang executable} -The plug-in operates by invoking the executable \irg (which must be on the system \verb|\$PATH|) -on each file identified as C++, in turn. -For each file it produces a temporary output file containing the C AST, which is then given as input to \framac. +The plug-in operates by invoking the executable \irg (which must be on the system \verb|$PATH|) +on each file identified as \cpp, in turn. +For each file it produces a temporary output file containing the \C AST, which is then given as input to \framac. This executable is a single-file-at-a-time command-line executable only. Various options control its behavior. The file-system path identifying the executable is provided by the \textbf{-cxx-clang-command <cmd>} -option and is \irg by default. The path may be absolute; if it is a relative path, it is found by searching the system \verb|\$PATH|. +option and is \irg by default. The path may be absolute; if it is a relative path, it is found by searching the system \verb|$PATH|. -The launching of \irg by \framac includes the following options by default: +The launching of \irg by \framac includes the following options by default. The \fc option \lstinline|-fclang-msg-key=clang| will show (among other information) the internal command-line being invoked. \begin{itemize} \item \verb|-target <target>| with the target being set according to the \lstinline|-machdep| and \lstinline|-target| options given to \framac (cf. \S\ref{sec:bit}) \item \verb|-D__FC_MACHDEP_86_32| -- also set according to the chosen architecture \item \verb|-std=c++11| -- target C++11 features \item \verb|-nostdinc| -- use \fcl and \framac system header files, and not the compiler's own header files -\item \verb|-I$FRAMAC_SHARE/frama-clangs/libc++ -I$FRAMAC_SHARE/libc| -- include the \fclang and \framac header files, which contain system library definitions with \acslpp annotations +\item \verb|-I$FRAMAC_SHARE/frama-clangs/libc++ -I$FRAMAC_SHARE/libc| -- include the \fclang and \framac header files, which contain system library definitions with \acslpp annotations (the paths used are controlled by the \fc options \lstinline|-cxx-c++stdlib-path| annd \lstinline|-cxx-cstdlib-path|). \item \verb|-stop-annot-error| if the corresponding option (\lstinline|-fclang-warn-key=annot-error=abort|) is given to \framac -\item options to set the level of info messages and warning messages +\item options to set the level of info messages and warning messages, based on options on the \fc command-line \end{itemize} The PARSING section of the output of \lstinline|frama-c -kernel-h| lists some options for controlling the behavior described above. Also see the options listed by \lstinline|frama-c -fclang-h| such as \lstinline|-cxx-stdlib-path|, \lstinline|-cxx-cstdlib-path|, \lstinline|-cxx-nostdinc|, \lstinline|-cxx-stdinc|. \section{Frama-clang options} -The options controlling \fclang are of three sorts: +The options controlling \fclang are of four sorts: \begin{itemize} \item options known to the \framac kernel \item options \fcl plug-in has registered with the \fc kernel. These also are recognized by the \fc command-line. \item options known to \irg directly (and not to \fc) must be included in the internal command that invokes \irg using the \option{-cpp-extra-args} option. These options are described in \S\ref{sec:standalone}. +\item \clang options, which must be supplied using the \option{-cpp-extra-args} option, and are passed through \irg to \cl. See \S\ref{sec:standalone}. \end{itemize} -The options known to the \fcl plugin are processed by the \fc kernel when listed on the \fc command-line. +The options in the first two categories are processed by the \fc kernel when listed on the \fc command-line. The use of the \fc command-line is described in the core \fc user guide. There are many kernel options that affect all plugins and many options specific to \fclang. @@ -91,15 +96,15 @@ shows all \fcl-specific options. The most important of the options are these: \begin{itemize} +\item \option{-help} -- introduction to \framac help \item \option{-kernel-h}, \option{-fclang-h} -- help information about \fc, the \fc kernel and the \fcl plug-in -ption{-help}, \item \option{-cpp-extra-args <string>} -- the single string argument to this option is \textit{prepended} to the command-line when \lstinline|frama-clang| is invoked internally. It is particularly important for adding include directories (\lstinline|-I|) and other options to be passed on to the clang compiler or the \irg executable. Multiple instances of this option have a cumulative effect, in order (rather than later instances replacing earlier ones). - \item \option{-print} -- prints out the input file seen by \fc; when \fcl is being used this is the input file after pre-processing and translation from C++ to C. Thus this output can be useful to see (and debug) what the \fcl plug-in ahs done. + \item \option{-print} -- prints out the input file seen by \fc; when \fcl is being used this is the input file after pre-processing and translation from \cpp to \C. Thus this output can be useful to see (and debug) the results of \fcl's transformations. \item \option{-machdep <arg>} -- sets the target machine architecture, cf. \S\ref{sec:bit} \item \option{-fclang-msg-key <categories>} -- sets the amount of informational messages \item \option{-fclang-warn-key <categories>} -- sets the amount and behavior of warnings @@ -109,36 +114,41 @@ ption{-help}, \section{Include directories} -By default \irg is given the paths to the two directories containing the \fcl and \fc header files, which include \acslpp specifications for the C++ library functions. The default paths to these directories are set by the \lstinline|-cxx-c++stdlib-path| and \lstinline|-cxx-cstdlib-path| options. +By default \irg is given the paths to the two directories containing the \fcl and \fc header files, which include \acslpp specifications for the \cpp library functions. The default paths to these directories are set by the \lstinline|-cxx-c++stdlib-path| and \lstinline|-cxx-cstdlib-path| options. Users typically have additional header files for their own projects. These are supplied to the \fcl preprocessor using the option \lstinline|-cpp-extra-args|. -You can use \lstinline|-fclang-cpp-extra-args| instead of \lstinline|cpp-extra-args|; multiple such options also have a cumulative effect. the \fcl option only affects the \fcl plugin, whereas -\lstinline|cpp-extra-args| may be seen by other plugins as well, if such plugins do their own preprocessing. Also note that the presence of any instance of \lstinline|-fclang-cpp-extra-args| will cause uses of \lstinline|cpp-extra-args| to be ignored. - -The system header files supplied by \fcl may not include all C++ system files. Omissions should be reported to the \fc team. +You can use \lstinline|-fclang-cpp-extra-args| instead of \lstinline|cpp-extra-args|; multiple such options also have a cumulative effect. +The \fcl option only affects the \fcl plugin, whereas +\lstinline|-cpp-extra-args| may be seen by other plugins as well, if such plugins do their own preprocessing. Also note that the presence of any instance of \lstinline|-fclang-cpp-extra-args| will cause uses of \lstinline|-cpp-extra-args| to be ignored. +The system header files supplied by \fcl may not include all \cpp system files. Omissions should be reported to the \fc team. +As an example, to perform wp checking of files \lstinline|a.cpp| and \lstinline|inc/a.h|, one might use the command-line \\ +\centerline{\lstinline|frama-c -cpp-extra-args="-Iinc" -wp a.cpp|} \section{32 and 64-bit targets} \label{sec:bit} -\acslpp is for the most part machine-independent. There are some features of C++ that can be environment-dependent, such as the sizes of fundamental datatypes. Consequently, \framac has some options that allow the user to state what machine target is intended. The \fcl options are discussed in \S\ref{sec:fcloptions}. +\acslpp is for the most part machine-independent. There are some features of \cpp that can be environment-dependent, such as the sizes of fundamental datatypes. Consequently, \framac has some options that allow the user to state what machine target is intended. The \fcl options are discussed in \S\ref{sec:fcloptions}. \begin{itemize} -\item The \textbf{-machdep} option to \framac. See the allowed values using teh command \lstinline|frama-c -machdep help|. For example, with a value of x86\_32, \lstinline|sizeof(long)| has a value of 4, whereas with the option \textbf{-machdep x86\_64}, \lstinline|sizeof(long)| has a value of 8. -\item Alternnately, the value of \textbf{-machdep} can be set instead using an environment variable: \lstinline|MACHDEP|. The variable can be set either in the shell environment or on the command line with \lstinline|-DMACHDEP=...| +\item The \textbf{-machdep} option to \framac. See the allowed values using the command\\ +\centerline{ \lstinline|frama-c -machdep help|.} + For example, with a value of \lstinline|x86\_32|, \lstinline|sizeof(long)| has a value of 4, whereas with the option \textbf{-machdep x86\_64}, \lstinline|sizeof(long)| has a value of 8. +\item Alternately, the value of \textbf{-machdep} can be set instead using an environment variable: \lstinline|\_\_FC\_MACHDEP|. The variable can be set either in the shell environment or on the command line with \lstinline|-D\_\_FC\_MACHDEP=...| \end{itemize} \section{Warnings, errors, and informational output} -Output messages arise from multiple places: from the \fcl plugin, from the \irg lexer and parser, from the clang parser, and from the frama-c kernel (as well as from any other plugins that may be invoked, such as wp). They are controlled by a number of options within the \fc kernel and each plugin. Remember that clang and \irg options must be put in the \lstinline|-cpp-extra-args| option. +Output messages arise from multiple places: from the \fcl plugin, from the \irg lexer and parser, from the \clang parser, and from the \framac kernel (as well as from any other plugins that may be invoked, such as wp). They are controlled by a number of options within the \fc kernel and each plugin. Remember that \cl and \irg options must be put in the \lstinline|-cpp-extra-args| option. \subsection{Errors} -Error messages are always output. The key question is whether processing stops or continues with each error. Continuing can result in a cascade of unhelpful error messages, but stopping immediately can hide errors that occur later in source files. +Error messages are always output. The key question is whether processing stops or continues u[on encountering an error. +Continuing can result in a cascade of unhelpful error messages, but stopping immediately can hide errors that occur later in source files. \begin{itemize} \item \lstinline|-Werror| is a clang and \irg option that causes any parser warnings to be treated as errors \item \lstinline|--stop-annot-error| is a \irg option that causes prompt termination on errors (default is to continue) @@ -150,23 +160,23 @@ Error messages are always output. The key question is whether processing stops o The various categories of warnings from \fcl can be seen with the command \lstinline|frama-c -fclang-warn-key help|. Warning messages are emitted by default. -Warning messages from \irg can be controlled with the \lstinline|-warn| option of ]\irg, or, equivalently, the \lstinline|-fclang-warn-key=parse| option of \fc. +Warning messages from \irg can be controlled with the \lstinline|-warn| option of \irg, or, equivalently, the \lstinline|-fclang-warn-key=parse| option of \fc. \begin{itemize} -\item the \irg option \lstinline|--no-warn| or \lstinline|-warn=0|turns off \irg warning messages +\item the \irg option \lstinline|--no-warn| or \lstinline|-warn=0| turns off \irg warning messages \item the \irg option \lstinline|--warn=<n>|, with $n > 0$ turns on \irg warning messages; the higher the value $n$ the more messages \item the \irg option \lstinline|--warn| is the same as \lstinline|--warn=1| \item the \fc option \lstinline|-fclang-warn-key=parse=inactive| is the same as the \irg option \lstinline|--warn=0| \item the \fc option \lstinline|-fclang-warn-key=parse=active| is the same as the \irg option \lstinline|--warn=1| \end{itemize} -The clang options are not currently integrated with the \fc warning and error key system. +\textit{The clang options are not currently integrated with the \fc warning and error key system.} \subsection{Informational output} \textit{This section is not yet written} -The clang informational output is not currently integrated with the \fc warning and error key system. +\textit{The clang informational output is not currently integrated with the \fc warning and error key system.} \chapter{Running the \fclang front-end standalone} \label{sec:standalone} @@ -174,28 +184,59 @@ The clang informational output is not currently integrated with the \fc warning In normal use within \framac, the \irg executable is invoked automatically. However, it can also be run standalone. In this mode it accepts command-line options and a single input file; -it produces a C AST representing the translated C++, in a text format -of the Cil (C Intermediate language). +it produces a C AST representing the translated \cpp, in the Cil (C Intermediate language) text format. + + +\section{\irg specific options} +\label{sec:fcloptions} + +These options that are specific to \irg and must be specified by using \option{-cpp-extra-args}: +\begin{itemize} + \item \option{-o <file>} -- specifies the name and location of the output file (that is, the file to contain the generated AST). The output path may be absolute or relative to the current working directory. \textit{This option is required.} + \item \option{-{-}info=<n>} -- sets the level of informational messages to \textbf{n}; 0 is completely quiet; increasing values are + more verbose. \\ + \option{-{-}info} sets the level to 1 \\ + \option{-{-}no-info} sets the level to 0\\ + The \lstinline|frama-c| option \option{-fclang-msg-key=parse} is equivalent to setting a value of 1. + + \item \option{-{-}warn=<n>} -- sets the level of parser warning messages to \textbf{n}; 0 is completely quiet; increasing values are +more verbose. \option{-{-}warn} sets the level to 1\\ +\option{-{-}no-warn} sets the level to 0\\ +The \lstinline|frama-c| option \option{-fclang-warn-key=parse} is equivalent to setting a value of 1. + + \item \option{-{-}debug=<n>} -- sets the level of parser debug messages to \textbf{n}; 0 is completely quiet; increasing values are +more verbose\\ + \option{-{-}debug} sets the level to 1\\ +\option{-{-}no-debug} sets the level to 0\\ +The \lstinline|frama-c| option \option{-fclang-debug=<n>} is equivalent to setting a value of \textbf{n}. +In particular, a debug value of 1 shows the command-line that invokes \irg. + + \item \option{-{-}stop-annot-error} -- if set, then parsing stops on the first error; default is off + \item \option{-{-}exit-code} -- if set, then the exit code of \irg is 1 if errors occur; this is not the default because then \lstinline|frama-c| would terminate upon + any error in \lstinline|framaCIRGen| + \item \option{-{-}no-exit-code} -- disables \option{-{-}exit-code}, so that the exit code is always 0. + +\end{itemize} \section{Clang options} -Frama-Clang is built on the Clang C++ parser. +Frama-Clang is built on the \clang \cpp parser. As such, the \irg executable accepts the clang compiler options and passes them on to clang. There are many of these. Many of these are irrelevant to \fcl as they relate to code generation, whereas \fcl only uses clang for parsing, name -and type resolution. +and type resolution, and producing the AST. You can see a list of options by running \lstinline|framaCIRGen -help| -The most significant options for \irg are these: +The most significant \cl options are these: \begin{itemize} \item \option{-I <dir>} -- adds a directory to the include file search path. Using absolute paths is recommended; relative paths are relative to the current working directory. - \item \option{-o <file>} -- specifies the name and location of the output file (that is, the file to contain the generated AST). The output path may be absolute or relative to the current working directory. \textit{This option is required.} - \item \option{-std=...} -- the input programming language (default is \lstinline|-std=c++11|) \item \option{-{-}version} -- print version information \end{itemize} +Although \clang can process languages other than \cpp, \cpp is the only one usable with \fclang. + %\section{Custom ASTs} %\lstset{keepspaces=true} %In standard mode, \framac invokes \irg on a file, producing an AST in intermediate format, and the reads that intermediate file into \framac to complete the processing. @@ -219,37 +260,3 @@ The most significant options for \irg are these: %where \lstinline|ct| is an executable similar to\\ %\centerline{\bf \texttt{for f in \$@ ; do cat \$\{f\%.*\}.ast ; done}} -\section{\irg specific options} -\label{sec:fcloptions} - -There are also options that are specific to \irg. -These are listed below: -\begin{itemize} - \item \option{-o <file>} -- specifies the given file as the output file for the translated AST (file is overwritten) - \item \option{-{-}info=<n>} -- sets the level of informational messages to \textbf{n}; 0 is completely quiet; increasing values are - more verbose. \\ - \option{-{-}info} sets the level to 1 \\ - \option{-{-}no-info} sets the level to 0\\ - The \lstinline|frama-c| option \option{-fclang-msg-key=parse} is equivalent to setting a value of 1. - - \item \option{-{-}warn=<n>} -- sets the level of parser warning messages to \textbf{n}; 0 is completely quiet; increasing values are -more verbose. \option{-{-}warn} sets the level to 1\\ -\option{-{-}no-warn} sets the level to 0\\ -The \lstinline|frama-c| option \option{-fclang-warn-key=parse} is equivalent to setting a value of 1. - - \item \option{-{-}debug=<n>} -- sets the level of parser debug messages to \textbf{n}; 0 is completely quiet; increasing values are -more verbose\\ - \option{-{-}debug} sets the level to 1\\ -\option{-{-}no-debug} sets the level to 0\\ -The \lstinline|frama-c| option \option{-fclang-debug=<n>} is equivalent to setting a value of \textbf{n}. -In particular, a debug value of 1 shows the command-line that invokes \irg. - - \item \option{-{-}stop-annot-error} -- if set, then parsing stops on the first error; default is off - \item \option{-{-}exit-code} -- if set, then the exit code of \irg is 1 if errors occur; this is not the default because then \lstinline|frama-c| would terminate upon - any error in \lstinline|framaCIRGen| - \item \option{-{-}no-exit-code} -- disables \option{-{-}exit-code}, so that the exit code is always 0. - \item \option{-{-}gen-impl-method} -- TODO - -\end{itemize} - - diff --git a/doc/userman/fclangversion.tex b/doc/userman/fclangversion.tex index 15000469..2da020d3 100644 --- a/doc/userman/fclangversion.tex +++ b/doc/userman/fclangversion.tex @@ -1,3 +1,4 @@ -\newcommand{\version}{0.1\xspace} -\newcommand{\fclangversion}{DRAFT\xspace} +\newcommand{\version}{0.1-REVIEW\xspace} +\newcommand{\fclangversion}{0.1.0-REVIEW\xspace} \newcommand{\fcversion}{Potassium+\xspace} +\newcommand{\clangversion}{7.0} diff --git a/doc/userman/grammar.tex b/doc/userman/grammar.tex index c58e3dcf..2378f22f 100644 --- a/doc/userman/grammar.tex +++ b/doc/userman/grammar.tex @@ -8,12 +8,12 @@ This material may be useful for developers and maintainers of \fclang; it is not Recall that \fclang uses clang to parse \lang and a custom parser to parse \acslpp annotations, jointly producing a representation of the \lang and \acslpp source input in the Frama-C intermediate language. The first version of the \acslpp custom parser, written during the STANCE project, used a hand-written bison-like parser, but with function pointers to record state and actions rather than using a tool-generated table to drive the parsing. This design proved to be too brittle and difficult to efficiently evolve as new features were added to \acslpp. -Consequently during the VESSEDIA project, the scanner and parser were completely rewritten, largely retaining the previous design's connections to clang, token definitions, and name lookup and type resolution. +Consequently during the VESSEDIA project, the scanner and parser were completely rewritten, largely retaining the previous design's connections to clang, token definitions, name lookup and type resolution, ad AST generation. The new parser uses a recursive descent design in which the names of functions doing the parsing can match the names of non-terminals in the grammar. Consequently the implementation of the parser is much more readable, human checkable, and modifiable as the \acslpp language evolves. The drawback of this design is that \acslpp is not LL(1); it is not even LL(k) for fixed k. -Thus some amount of backtracking is required. The offending features are mostly inherited from \lang. +Thus some amount of backtracking is required. The next few subsections describe the problematic aspects of \acslpp and how they are addressed. The principal goal of an LL(k) formulation of a grammar is to be able to predict which grammar production is being seen in the input stream from a small amount of look-ahead. @@ -32,11 +32,11 @@ This makes it not possible to distinguish terms and predicates in top-down parsi However, Frama-C has different AST nodes for the two, so it would require a very significant refactoring of Frama-C and all its plugins to unify terms and predicates (as other specification languages have done). Note that this problem is a challenge for any parser design. The previous and current parser designs adopted the same solution: maintain two parallel parses of expressions --- one as a term and one as a predicate. -Error messages are emitted only when both parses fail or when a particular grammar production calls for a particular type (term or predicate) and that one is not available. +Error messages are emitted only when both parses fail or when a particular grammar production calls for a particular type of AST (term or predicate) and that one is not available. \item \textbf{terms vs. tsets}. Similarly, the ACSL++ language definition defines tsets (sets of locations) with grammar productions separately from terms. However, the grammars for the two are very similar. -It is much easier to parse and to implement if tsets are seen as terms with a specific type, namely sets. +\acslpp is much easier to parse and to implement if tsets are seen as terms with a specific type, namely sets. Many operations on a data type are also simply element-by-element operations on sets of such data types. Also, errors found during type-checking can be associated with more readable error messages than those found during parsing. @@ -49,14 +49,17 @@ When a left parenthesis is parsed in an expression context, the parser proceeds If such a parse fails, no error messages are emitted; rather the parse is rewound and proceeds again assuming the token sequence to be a parenthesized expression. -\item \textbf{set comprehension}. The syntax of the set comprehension expression follows traditional mathematics: \lstinline: { $expr$ | $binders$ ; $predicate$ }:. This structure poses two difficulties for parsers. First, the expression may contain bitwise-or operators, so it is not known to the parser whether an occurence of \verb:|: is the beginning of the binders or is just a bitwise-or operator. Second, the expression will use the variables declared in the binders section. However, the binders are not seen until after the expression is scanned. Note that these problems are not unique to a recursive descent design; they would challenge a LR parser just as much. \textit{This particular feature is not yet implemented in \fclang, nor in Frama-C and so is not yet resolved in the parser implementation.} +\item \textbf{set comprehension}. The syntax of the set comprehension expression follows traditional mathematics: \lstinline: { $expr$ | $binders$ ; $predicate$ }:. This structure poses two difficulties for parsers. First, the expression $expr$ may contain bitwise-or operators, so it is not known to the parser whether an occurence of \verb:|: is the beginning of the binders or is just a bitwise-or operator. Second, the expression will use the variables declared in the binders section. However, the binders are not seen until after the expression is scanned. Note that these problems are not unique to a recursive descent design; they would challenge a LR parser just as much. \textit{This particular feature is not yet implemented in \fclang, nor in Frama-C and so is not yet resolved in the parser implementation.} \item \textbf{labeled expressions}. \acslpp allows expressions to have labels, designated by a \lstinline|id : | prefix. So the parser cannot know whether an initial \lstinline|id| is a variable or just a label until the colon is parsed. Thus this situation requires a lookahead of 2 tokens. -Ambiguity arises with the use of a colon for the else part of a conditional expression. So in an expression such as -\lstinline| a ? b ? c : d : e : f|, it is not know whether \lstinline|c| or \lstinline|d| or \lstinline|e| is a label. Parenthesizing must be used to solve this problem. \fcl presumes that if the then part of a conditional expression is being parsed, a following colon is always the colon introducing the else part. That is, the binding to a conditional expression has tighter precedence than to a naming expression. +Ambiguity arises with the use of a colon for the else part of a conditional expression. +So in an expression such as +\lstinline| a ? b ? c : d : e : f|, it is ambiguous whether \lstinline|c| or \lstinline|d| or \lstinline|e| is a label. +Parenthesizing must be used to solve this problem. +\fcl presumes that if the \textit{then} part of a conditional expression is being parsed, a following colon is always the colon introducing the \textit{else} part. That is, the binding to a conditional expression has tighter precedence than to a naming expression. \end{itemize} diff --git a/doc/userman/introduction.tex b/doc/userman/introduction.tex index 9ec919c7..85aa5ac1 100644 --- a/doc/userman/introduction.tex +++ b/doc/userman/introduction.tex @@ -1,24 +1,25 @@ \chapter{Introduction} -\framac~\cite{userman,fac15} is a modular analysis framework for the C +\framac~\cite{userman,fac15} is a modular analysis framework for the \C programming language that supports the \acsl specification language~\cite{acsl}. This manual documents the \fclang plug-in of \framac, version \fclangversion. -The \fclang plug-in supports the \acslpp extension of \acsl for C++ programs and specifications; -it is built on the Clang\footnote{https://clang.llvm.org/} compiler infrastructure and uses Clang for -parsing C++. The plug-in extends clang to parse \acslpp, translating it into an intermediate language for C and \acsl. +The \fclang plug-in supports the \acslpp extension of \acsl for \cpp programs and specifications; +it is built on the \clang\footnote{https://clang.llvm.org/} compiler infrastructure and uses \clang for +parsing C++. The plug-in extends clang to parse \acslpp, translating it into an intermediate language for \C and \acsl. -The \fclang plug-in intends to provide a full translation of C++ and \acslpp into the \framac internal representation, and from there to allow C++ progems and \acslpp specifications to be analyzed by other \framac plug-ins. +The \fclang plug-in intends to provide a full translation of \cpp and \acslpp into the \framac internal representation, and from there to allow \cpp programs and \acslpp specifications to be analyzed by other \framac plug-ins. \textit{This is a work in progress.} The following sections describe the current status and limitations of the current implementation. \begin{itemize} - \item The plug-in aims for the C++11 version of C++ - \item \acslpp is described in the companion \acslpp reference manual \cite[acslpp], also a part of the \framac release. - \item The plug-in is compatible with version 8 of Clang. - This version of Clang supports C++ versions through C++17 - (cf. \url{https://clang.llvm.prg/cxx_status.;html}. - However, \fclang may not support all of the features of C++ within annotations. + \item The plug-in aims for the C++11 version of \cpp + \item \acslpp is described in the companion \acslpp reference manual \cite{acslpp}, also a part of the \framac release. + \item The plug-in is compatible with version 8 of \clang. + This version of \clang supports \cpp versions through C++17 + (cf. \url{https://clang.llvm.prg/cxx_status.html}). + However, \fclang may not support all of the features of \cpp within annotations. \end{itemize} -The source material for this document is in Frama-C's gitlab repository: \url{git@git.frama-c.com:frama-c/frama-clang.git}, in directory \lstinline|doc/userman| . +The source material for this document is in Frama-C's gitlab repository:\\ \centerline{\url{git@git.frama-c.com:frama-c/frama-clang.git},} \\ + in directory \lstinline|doc/userman|. diff --git a/doc/userman/limitations.tex b/doc/userman/limitations.tex index e0ca857c..3132abc8 100644 --- a/doc/userman/limitations.tex +++ b/doc/userman/limitations.tex @@ -2,8 +2,8 @@ The development of the \fclang plug-in is still ongoing. \fclang does not implement all of current C++ nor all of -\acslpp as defined in its language definition.\cite{acslpp} -The mosty important such limitations are described in this section. +\acslpp as defined in its language definition\cite{acslpp}. +The most important such limitations are listed in this section. \section{Implementation of C++} @@ -11,16 +11,19 @@ The mosty important such limitations are described in this section. The following C++ features are not implemented in \acslpp. \begin{itemize} -\item \acslpp definitions within template declarations -\item digraphs within \acslpp annotations \item preprocessing is restricted within \acslpp annotations (cf. \S\ref{sec:preprocessing}) \end{itemize} \section{Implementation of \acslpp} +These \acslpp features are not yet implemented \begin{itemize} +\item \acslpp definitions within template declarations +\item \acslpp specification for standard \cpp library functions are still quite limited \item ghost declarations \item model declarations \item set comprehensions +\item frama-clang info/warn/error messages are not yet properly categorized and integrated with -fclang-log, -fclang-msg-key, fclang-warn-key. In particular, clang messages are completely independent + \end{itemize} diff --git a/doc/userman/macros.tex b/doc/userman/macros.tex index 60c23ce5..9de4399c 100644 --- a/doc/userman/macros.tex +++ b/doc/userman/macros.tex @@ -6,12 +6,20 @@ \newcommand{\framac}{\textsc{Frama-C}\xspace} \newcommand{\fclang}{\textsc{Frama-Clang}\xspace} +\newcommand{\clang}{\textsc{clang}\xspace} \newcommand{\acsl}{\textsc{ACSL}\xspace} \newcommand{\acslpp}{\textsc{ACSL++}\xspace} \newcommand{\acslb}{\acsl/\acslpp\xspace} + +\newcommand{\irg}{\lstinline|framaCIRGen|\xspace} +\newcommand{\fcl}{\lstinline|frama-clang|\xspace} +\newcommand{\fc}{\lstinline|frama-c|\xspace} +\newcommand{\cl}{\lstinline|clang|\xspace} + \newcommand{\rte}{\textsc{RTE}\xspace} \newcommand{\C}{\textsc{C}\xspace} +\newcommand{\cpp}{\textsc{C++}\xspace} \newcommand{\jml}{\textsc{JML}\xspace} \newcommand{\Eva}{\textsc{Eva}\xspace} \newcommand{\variadic}{\textsc{Variadic}\xspace} @@ -46,9 +54,6 @@ \newcommand{\todo}[1]{{\large \textbf{TODO: #1.}}} \newcommand{\option}[1]{\textbf{#1}} -\newcommand{\irg}{\lstinline|framaCIRGen|\xspace} -\newcommand{\fcl}{\lstinline|frama-clang|\xspace} -\newcommand{\fc}{\lstinline|frama-c|\xspace} \newcommand{\markdiff}[1]{{\color{blue}{#1}}} \newenvironment{markdiffenv}[1][]{% diff --git a/doc/userman/main.tex b/doc/userman/main.tex index 58242ada..c26c5d27 100644 --- a/doc/userman/main.tex +++ b/doc/userman/main.tex @@ -75,27 +75,44 @@ which received funding from the European Union's 2020 Research and Innovation Program under grant agreement No. 731453. + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\include{introduction} +\include{description} +\include{limitations} + +\include{preprocessing} +\include{grammar} + + +%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% + +\appendix + +\include{changes} + \section*{TODO} These items are not yet addressed \begin{itemize} -\item (Introduction) Verify the version of Clang supported -\item The custom build instructions with a prefix to a temp directory fail +\item Add/revise list of conntributors/reviewers +\item (Introduction) Verify the version of \framac and \clang supported \item Sort out the interactions among the options -verbose -debug -quiet -fclang-debug -fclang-verbose -fclang-msg-key -fclang-warn-key -\item Sort out and describe interactions among -annot -pp-annot and frama-clang -\item frama-clang info/warn/error messages are not yet properly categorized and integrated with -fclang-log, -fclang-msg-key, fclang-warn-key. In particular, clang messages are completely independent +\item Sort out and describe interactions among -annot -pp-annot and frama-clang options \item Add comments about using system header files \item Remove \$FRAMAC\_SHARE from the default command-line, and builtin.h from \$FRAMAC\_SHARE \item Comment on the demangling options -\item Comment on printing out the preprocessed file -\item \irg options are not included in -fclang-h -\item -print option to \irg \item Review the clang and \irg options -\item Fix how to do custom ASTs \item decribe the framaCIRGen options: -x -b -v --gen-impl-meth \item Comment on handling of unicode in the preprocessor \item Review the preprocessor tokens \item Various parts of the tool use stdout and stderr differently +\item \fclang cannot process annotations that are separate from the source file +\item must the argument of -cpp-extra-args always be given with an = and quoted string +\item Interactions between \_\_FC\_MACHDEP and -machdep +\item Check whether static or dynamic linking is used in building \end{itemize} Bugs @@ -104,29 +121,15 @@ Bugs \item non-clang options not included in framaCIRGen -help \item The -o option is required \item What is the -w option +\item Fix how to do custom ASTs; The custom build instructions with a prefix to a temp directory fail +\item \irg options are not included in -fclang-h + \item framaCIRGen -o prints help summary twice -\item it appears that trigraph processing cannot be turned off, is on by default for framaCIRGen but off by default for clang++; not sure why -\item it does not seem that clang options are passed through, e.g. -fdollars-in-identifiers; framaCIRGen defaults are different than clang -\item preprocessor does not concatenate string literals +\item it appears that trigraph processing cannot be turned off, is on by default for \irg but off by default for clang++; not sure why +\item it does not seem that clang options are passed through, e.g. -fdollars-in-identifiers; \irg defaults are different than clang +\item preprocessor implementation does not concatenate string literals \end{itemize} - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\include{introduction} -\include{description} -\include{limitations} - -\include{preprocessing} -\include{grammar} - - -%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% - -\appendix - -\include{changes} - %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \cleardoublepage diff --git a/doc/userman/preprocessing.tex b/doc/userman/preprocessing.tex index 0b53ee57..d3a53d89 100644 --- a/doc/userman/preprocessing.tex +++ b/doc/userman/preprocessing.tex @@ -1,53 +1,62 @@ \chapter{Preprocessing} \label{sec:preprocessing} -This section describes the implementation of the C++ preprocessor for \acslpp -annotations. Recall that the C++ preprocessor replaces comments (including +This section describes the implementation of the \cpp preprocessor for \acslpp +annotations. Recall that the \cpp preprocessor replaces comments (including \acslpp comments) by white space, without operations such as handling preprocessor directives. Accordingly, \fclang must handle standard preprocessing within \acslpp annotations itself. -As a refresher, the C/C++ preprocessor (CPP) (cf. \url{https://gcc.gnu.org/onlinedocs/cpp/}) conceptually implements the following operations on a C++ source file: +As a refresher, the \C/\cpp preprocessor (CPP) (cf. \url{https://gcc.gnu.org/onlinedocs/cpp/}) conceptually implements the following operations on a \cpp source file: \begin{itemize} - \item The input is read and broken into a sequence of physical lines according to the line terminators (ASCII character sequences \verb|\r|, \verb|\n|, or \verb|\r\n|). - \item Each C trigraph is replaced by its corresponding character (if trigraph processing is enabled). - \item Any backslash-whitespace-line-terminator sequence is removed and the line that it ends is combined with the following line. + \item The input is translated into a basic set of characters, including replacing trigraph sequences by their source character set equivalents + \item Any backslash-whitespace-line-terminator sequence is removed and the line that it ends is combined with the following line, producing a sequence of logical lines. \item Comments are replaced by single spaces. This requires tokenizing the input to avoid recognizing comment markers within strings as indicating a comment. Note that this allows block comments to hide line terminations. \item The input text is divided into preprocessing tokens grouped in logical lines. Each preprocessor token becomes a compiler token (except where \#\# concatenation occurs). However, \acslb tokens are slightly different, as described below. - \item The source text is transformed according to any preprocessing directives contained within it. Each preprocessing directive must be contained within one logical line. The result has no preprocessing directives remaining. + \item The source text is transformed according to any preprocessing directives contained within it. + Each preprocessing directive must be contained within one logical line. The result has no preprocessing directives remaining. \item Adjacent string literals (separated only by white-space or line-endings) are concatenated into a single string literal. \end{itemize} The result is a sequence of preprocessing tokens that is passed on to the remaining compiler phases. -\subsection{Trigraphs} +\section{\fclang preprocessor implementation} +The \fclang implementation operates as follows, on each \acslpp annotation comment in turn: +\begin{itemize} +\item A simplified custom lexer converts the text into preprocessor tokens, without doing macro substitution, to find instances of forbidden preprocessor directives. If possible and reasonable, these are elided from the input text and processing continues. +\item The text is then submitted to \cl to obtain the complete sequence of preprocessor tokens, now with full preprocessing (except for adjacent string concatenation). +\item \fcl transforms the clang preprocessor tokens into \acslpp tokens, which are then passed on to the \acslpp parser to produce the desired AST. +\end{itemize} + +\section{Trigraphs} -Trigraphs are defined for C++ but will currently be removed in C++17. Since trigraph processing by clang occurs before any recognition of comments, trigraphs in \acslpp annotations are translated. As they will be removed from C++, they are not recommended for use in \acslpp annotations. +Trigraphs are defined for \cpp but will currently be removed in C++17. Since trigraph processing by clang occurs before any recognition of comments, trigraphs in \acslpp annotations are translated, if enabled in \clang. As they will be removed from \cpp, they are not recommended for use in \acslpp annotations. -\subsection{Digraphs} +\section{Digraphs} Digraphs are alternate spellings of preprocessor tokens, in particular, of -punctuation character sequences. Digraphs in \acslpp annotations are translated just as they are in C++ (by clang). +punctuation character sequences. Digraphs in \acslpp annotations are translated just as they are in \cpp (by clang). -\subsection{Preprocessor tokens} +\section{Preprocessor tokens} Preprocessor tokens (per CPP) belong to one of five categories. White space (space, tab) serves only to separate tokens; it is not needed between tokens whose concatenation is not a single token. Line terminators also separate tokens and also delineate certain features: preprocessing directives and string literals do not extend over more than one (logical) line. \begin{itemize} \item identifiers: character sequences that match the regular expression \\ -\centerline{\texttt{[a-zA-Z\_][a-zA-Z\_0-9]*}} \\ Dollar signs are allowed as non-digit identifier characters if the clang option \texttt{-fdollars-in-identifiers} is enabled. -Enable with \texttt{-fdollars-in-identifiers} ; disable with \texttt{-fno-dollars-in-identifiers} ; enabled by default. +\centerline{\texttt{[a-zA-Z\_][a-zA-Z\_0-9]*}} \\ +Dollar signs are also allowed as non-digit identifier characters if the clang option\\ \texttt{-fdollars-in-identifiers} is enabled, which it is by default. +Enable with \texttt{-fdollars-in-identifiers} ; disable with \texttt{-fno-dollars-in-identifiers} . \item number: character sequences that match the regular expression \\ \centerline{\texttt{[.]?[0-9]([0-9a-zA-Z.]|[eEpP][+-]))*}} \\ - \item string literals: character sequence enclosed in " " or ' ' or < >, with \textbackslash " for " in a double-quoted literal (that is not a header file name) and \' for ' in a single-quoted literal. After all other preprocessor phases are complete, the preprocessort concatenates adjacent string literals. + \item string literals: character sequence enclosed in " " or ' ' or < >, with \textbackslash " for " in a double-quoted literal (that is not a header file name) and \textbackslash ' for ' in a single-quoted literal, and \textbackslash\textbackslash for \textbackslash in either. After all other preprocessor phases are complete, the preprocessor concatenates adjacent string literals. \item punctuator: all single non-alphanumeric printable characters except -\verb|@|, \verb|#| and \verb|`|, and all multi-punctuation sequences that meaningful to C/C++ (e.g. > and >{>}= ), but not an arbitrary multi-punctuation character sequence. +\verb|@|, \verb|#| and \verb|`|, and all multi-punctuation sequences that meaningful to \C/\cpp (e.g. > and >{>}= ), but not an arbitrary multi-punctuation character sequence. \item other tokens: \verb|@|, \verb|#|, \verb|`| and all non-ASCII single characters. \end{itemize} The above token definitions imply that arbitrary text can always be broken into legitimate preprocessor tokens, with the exception of a few characters and of badly formed unicode sequences. -Note that not all preprocessor tokens are valid C/C++ parser tokens. Tokens in the other category have no meaning to C/C++ and the \texttt{number} category allows many sequences that are not legal C/C++ numeric tokens. These tokens will generally provoke compiler errors. For example in C/C++, 0..2 is one token and is not interpreted as two consecutive numeric tokens. +Note that not all preprocessor tokens are valid \C/\cpp parser tokens. Tokens in the other category have no meaning to \C/\cpp and the \texttt{number} category allows many sequences that are not legal \C/\cpp numeric tokens. These tokens will generally provoke compiler errors. For example in \C/\cpp, 0..2 is one token and is not interpreted as two consecutive numeric tokens. \acsl and \acslpp have slightly different tokens than the above, so the preprocessor tokens need to be re-tokenized in some cases: \begin{itemize} @@ -55,7 +64,7 @@ Note that not all preprocessor tokens are valid C/C++ parser tokens. Tokens in t \item There are some \acslb multi-character punctuator tokens that are not single preprocessor tokens: \begin{itemize} - \item[] all \acslb keywords that begin with a backslash, such as \verb|\result|. + \item all \acslb keywords that begin with a backslash, such as \verb|\result|. \item \verb|==>| (logical implies) \item \verb|-->| (bit-wise implies) \item \verb|<==>| (logical equality) @@ -65,16 +74,16 @@ Note that not all preprocessor tokens are valid C/C++ parser tokens. Tokens in t \item \verb:[|: and \verb:|]: (list creation) \end{itemize} These \acslb tokens need to be assembled from multiple CPP tokens (and those CPP tokens must not be separated by white space) - \item A CPP numeric token that contains .. will not be a legal C/C++ number, but may be a sequence of + \item A CPP numeric token that contains .. will not be a legal \C/\cpp number, but may be a sequence of legal \acslb tokens with the .. representing the range operator. For example, the single CPP token \texttt{0..last} is retokenized for \acslb as if it were written \texttt{0 .. last} . \item \acslb allows certain built-in non-ASCII symbols. An example is - $\forall$ (unicode 0x2200) x to designate a universal quantifier, + $\forall$ (unicode 0x2200) to designate a universal quantifier, which is an alternative form of \verb|\forall|. A complete list of such tokens is given in the \acslb language definition. \end{itemize} -\subsection{Preprocessor directives} +\section{Preprocessor directives} A preprocessing directive consists of a single logical line (after the previous preprocessing phases have been completed) that begins with optional white space, the \verb|#| character, additional optional white space, and a preprocessor directive identifier. The preprocessing language contains a fixed set of preprocessing directive identifiers: \begin{itemize} @@ -85,17 +94,28 @@ The preprocessing language contains a fixed set of preprocessing directive ident \item \texttt{line} \item \texttt{pragma} \end{itemize} -In addition, identifiers that have been defined (by a \#define directive) as macros are expanded according to the macro expansion rules (not described here). +In addition, identifiers that have been defined (by a \texttt{\#define} directive) as macros are expanded according to the macro expansion rules (not described here). -Because \acslb annotations are contained in C/C++ comments, -any directives contained in the annotation are not seen when the source file is processed simply as a C/C++ source file. However, the effect of some directives lasts until the end of the source file. +Because \acslb annotations are contained in \C/\cpp comments, +any directives contained in the annotation are not seen when the source file is processed simply as a \C/\cpp source file. However, the effect of some directives lasts until the end of the source file. Accordingly, \acslpp imposes constraints on the directives that may be present within annotations: \begin{itemize} - \item \texttt{define} and \texttt{undef} are not permitted in an annotation - \item \texttt{if}, \texttt{ifdef}, \texttt{ifndef}, \texttt{elif}, \texttt{else}, \texttt{endif} are permitted but must be completely nested within the annotation in which they appear (an \texttt{endif} and its corresponding \texttt{if}, \texttt{ifdef}, or \texttt{ifndef} directive must be in the same annotation comment.) - \item \texttt{warning} and \texttt{error} are permitted - \item \texttt{include} is permitted, but will cause errors if it contains, as is almost always the case, either \texttt{define} or \texttt{pragma} directives - \item \texttt{line} is not permitted - \item \texttt{pragma} is not permitted + \item \texttt{define} and \texttt{undef} are not permitted in an annotation. (If they were to be allowed, their scope would have to extend only to the end of the annotation, which could be confusing to readers.) + \item macros occurring in an annotation but defined by \texttt{define} statements prior to the annotation are expanded according to the normal rules, including concatenation by \texttt{\#\#} operators. + The context of macro definitions corresponds to the textual location of the annotation, as would be the case if the + annotation were not embedded in a comment. + \item \texttt{if}, \texttt{ifdef}, \texttt{ifndef}, \texttt{elif}, \texttt{else}, \texttt{endif} are permitted but must be completely nested within the annotation in which they appear (an \texttt{endif} and its corresponding \texttt{if}, \texttt{ifdef}, \texttt{ifndef}, or \texttt{elif} must both be in the same annotation comment.) + \item \texttt{warning} and \texttt{error} are permitted + \item \texttt{include} is permitted, but will cause errors if it contains, as is almost always the case, other disallowed directives + \item \texttt{line} - is not permitted + \item \texttt{pragma} and the \texttt{\_Pragma} operator are not permitted + \item stringizing (\verb|#|) and string concatenation (\verb|##|) operators are permitted + \item the \verb|defined| operator is permitted + \item the standard predefined macro names are permitted: + \texttt{\_\_cpluscplus} (in \cpp compilers), + \texttt{\_\_DATE\_\_}, + \texttt{\_\_LINE\_\_}, + \texttt{\_\_TIME\_\_}, + \texttt{\_\_STDC\_HOSTED\_\_} \end{itemize} -- GitLab