[doc] Update the manual.

53ae2438 · Michele Alberti · 2b40a445 · 53ae2438 · 53ae2438 · 53ae2438
Commit 53ae2438 authored 1 year ago by Michele Alberti
--- a/doc/acas_xu.rst
+++ b/doc/acas_xu.rst
@@ -142,33 +142,36 @@ Our file looks like this so far:
   end
 We would like to verify our property given a certain neural network. To do this,
-CAISAR provide WhyML extensions to recognize and apply
+CAISAR provide WhyML extensions to recognize and apply neural networks in ONNX
-neural networks in ONNX and NNet formats on vector inputs.
+and NNet formats on vector inputs. Given a file of such formats, CAISAR is able
-Given a file of such formats, CAISAR is able to provide the following:
+to provide the following:
-* a logical symbol of type ``nn``, built using the ``read_model``
+* a logical symbol of type ``model nn``, built using the ``read_model``
-  function, of type ``string -> format -> nn``. The first argument is the path
+  function, of type ``string -> model nn``. The first argument is the path to
-  to the neural network file, ``format`` is either ``ONNX`` or ``NNet``, and
+  the neural network file, and ``model nn`` is the type of the neural network in
-  ``nn`` is the type of the neural network in WhyML;
+  WhyML;
 * a function symbol that returns the output of the application of the neural
  network to a given input;
 * types and predicates to manipulate inputs vectors;
 The full reference for those WhyML extensions is available under the
-`stdlib/interpretation.mlw
+`stdlib/caisar/types.mlw
-<https://git.frama-c.com/pub/caisar/-/blob/master/stdlib/interpretation.mlw>`_
+<https://git.frama-c.com/pub/caisar/-/blob/master/stdlib/caisar/types.mlw>`_ and
-file. To create a logical symbol for a neural network located in
+`stdlib/caisar/model.mlw
-``nets/onnx/ACASXU_1_1.onnx``, we can import the relevant theories in our file and
+<https://git.frama-c.com/pub/caisar/-/blob/master/stdlib/caisar/model.mlw>`_
-use the ``read_model`` function symbol like this:
+files. To create a logical symbol for a neural network located in
+``nets/onnx/ACASXU_1_1.onnx``, we can import the relevant theories in our file
+and use the ``read_model`` function symbol like this:
 .. code-block:: whyml
   theory ACASXU_P1
     use ieee_float.Float64
-     use interpretation.Vector
+     use caisar.types.Vector
-     use interpretation.NeuralNetwork
+     use caisar.model.Model
+     use caisar.model.NN
-     constant nn_1_1: nn = read_model "nets/onnx/ACASXU_1_1.onnx" ONNX
+     constant nn_1_1: model nn = read_model "nets/onnx/ACASXU_1_1.onnx"
   end
 Now is the time to define our verification goal, that will call ``P1_1_1`` for
@@ -177,8 +180,7 @@ model the inputs of the neural network :math:`\rho, \theta, \psi, v_{own},
 v_{int}` to the range of floating-point values each may take. We can do that by
 writing a predicate that encode those specification constraints. Since neural
 networks take vectors as inputs, we use the CAISAR library
-``interpretation.Vector``. Similarly, as we need to manipulate integer indexes,
+``caisar.types.Vector``.
-we require the use of the ``int.Int`` Why3 library.
 We can write this as a predicate for clarity:
@@ -186,11 +188,11 @@ We can write this as a predicate for clarity:
  theory ACASXU_P1
    use ieee_float.Float64
-    use int.Int
+    use caisar.types.Vector
-    use interpretation.Vector
+    use caisar.model.Model
-    use interpretation.NeuralNetwork
+    use caisar.model.NN
-    constant nn_1_1: nn = read_model "nets/onnx/ACASXU_1_1.onnx" ONNX
+    constant nn_1_1: model nn = read_model "nets/onnx/ACASXU_1_1.onnx"
    predicate valid_input (i: vector t) =
      (0.5999999999999999777955395074968691915273666381835937500000000000:t) .<= i[0] .<= (0.6798577687000000313588543576770462095737457275390625000000000000:t)
@@ -215,27 +217,27 @@ We can write this as a predicate for clarity:
  properties and the actual specifications.
 We must then define the result of the application of ``nn_1_1`` on the inputs.
-The built-in function ``@@`` serves this purpose. Its type, ``nn -> vector 'a ->
+The built-in function ``@@`` serves this purpose. Its type, ``model nn -> vector
-vector 'a``, describes what it does: given a neural network ``nn`` and an input
+'a -> vector 'a``, describes what it does: given a neural network ``model nn``
-vector ``x``, return the vector that is the result of the application of ``nn``
+and an input vector ``x``, return the vector that is the result of the
-on ``x``. Note that thanks to type polymorphism, ``@@`` can be used to describe
+application of ``model nn`` on ``x``. Note that thanks to type polymorphism,
-a variety of input vectors, including floating points, integers, or strings. We
+``@@`` can be used to describe a variety of input vectors, including floating
-can finally define the output constraint we want to enforce on the first
+points, integers, or strings. We can finally define the output constraint we
-coordinate of the output vector that we use to model the advisory *COC*. We use
+want to enforce on the first coordinate of the output vector that we use to
-the WhyML extension predicate ``has_length`` to further check that our inputs
+model the advisory *COC*. We use the WhyML extension predicate ``has_length`` to
-are of valid length.
+further check that our inputs are of valid length.
-The final WhyML file looks like this:
+The final WhyML file ``property_1.why`` looks like this:
 .. code-block:: whyml
  theory ACASXU_P1
    use ieee_float.Float64
-    use int.Int
+    use caisar.types.Vector
-    use interpretation.Vector
+    use caisar.model.Model
-    use interpretation.NeuralNetwork
+    use caisar.model.NN
-    constant nn_1_1: nn = read_model "nets/onnx/ACASXU_1_1.onnx" ONNX
+    constant nn_1_1: model nn = read_model "nets/onnx/ACASXU_1_1.onnx"
    predicate valid_input (i: vector t) =
      let rho = i[0] in
@@ -252,16 +254,15 @@ The final WhyML file looks like this:
    goal P1_1_1:
      forall i: vector t. has_length i 5 -> valid_input i ->
-        let coc = (nn_1_1@@i)[0] in
+        let coc = (nn_1_1 @@ i)[0] in
        coc .<= (3.9911256458999999630066213285317644476890563964843750000000000000:t)
  end
-This file is available, as is, under the ``/examples/acasxu/`` folder as
+A more general formulation of all ACAS-Xu properties is available in `acasxu.why
-`property_1.why
+<https://git.frama-c.com/pub/caisar/-/blob/master/examples/acasxu/acasxu.why>`_.
-<https://git.frama-c.com/pub/caisar/-/blob/master/examples/acasxu/property_1.why>`_.
+The corresponding neural networks in ONNX format are available under the folder
-The corresponding neural network in ONNX format is available under the
+`/examples/acasxu/nets/onnx/
-``/examples/acasxu/nets/onnx/`` folder as `ACASXU_1_1.onnx
+<https://git.frama-c.com/pub/caisar/-/blob/master/examples/acasxu/nets/onnx/>`_.
-<https://git.frama-c.com/pub/caisar/-/blob/master/examples/acasxu/nets/onnx/ACASXU_1_1.onnx>`_.
 Verifying the property with CAISAR
@@ -280,12 +281,12 @@ verifying the ACAS-Xu property :math:`\phi_1`:
 .. code-block:: console
-    $ caisar verify --prover Marabou examples/acasxu/property_1.why -t 10m
+    $ caisar verify --prover Marabou property_1.why -t 10m
    [caisar] Goal P1_1_1: Timeout
 .. code-block:: console
-    $ caisar verify --prover nnenum examples/acasxu/property_1.why -t 10m
+    $ caisar verify --prover nnenum property_1.why -t 10m
    [caisar] Goal P1_1_1: Valid
 Note that the previous commands set the verification time limit to 10 minutes
@@ -310,39 +311,37 @@ networks :math:`N_{1,1}` and :math:`N_{2,7}` [Katz2017]_.
 From the modelling standpoint, the main evident difference concerns the desired
 output property, meaining that *COC* should not be the minimal value. A
-straightforward way to express this property is that the neural network
+straightforward way to express this property is that the neural network output
-output is greater than or equal to at least one of
+is greater than or equal to at least one of the other four outputs. We can write
-the other four outputs. We can write a ``is_min`` predicate
+a ``is_min`` predicate that, for a vector ``o`` and int ``i``, states whether
-that, for a vector ``o`` and int ``i``, states whether the
+the :math:`i`-th coordinate of ``o`` is the minimal value:
-:math:`i`-th coordinate of ``o`` is the minimal value:
 .. code-block:: whyml
  predicate is_min (o: vector t) (i: int) =
    forall j: int. 0 <= j < 5 -> i <> j -> o[i] .<= o[j]
-We want to check the same property for two different neural
+We want to check the same property for two different neural networks. Of course,
-networks. Of course, we could define a theory with two identical but distinct
+we could define a theory with two identical but distinct verification goals or
-verification goals or two entirely distinct theories in a same WhyML file.
+two entirely distinct theories in a same WhyML file. However, these two
-However, these two solutions are not advisable in terms of clarity and
+solutions are not advisable in terms of clarity and maintainability.
-maintainability.
 Reassuringly enough, CAISAR provides all necessary features to come up with a
-better solution. Indeed, we can use the ``read_model``
+better solution. Indeed, we can use the ``read_model`` extension to define as
-extension to define as many symbols as needed to represent
+many symbols as needed to represent distinct neural networks.
-distinct neural networks.
-In the end, the WhyML file looks like this:
+In the end, the WhyML file ``property_3.why`` looks like this:
 .. code-block:: whyml
  theory ACASXU_P3
-    use ieee_float.Float64
    use int.Int
-    use interpretation.Vector
+    use ieee_float.Float64
-    use interpretation.NeuralNetwork
+    use caisar.types.Vector
+    use caisar.model.Model
-    constant nn_1_1: nn = read_model "nets/onnx/ACASXU_1_1.onnx" ONNX
+    use caisar.model.NN
-    constant nn_2_7: nn = read_model "nets/onnx/ACASXU_2_7.onnx" ONNX
+    constant nn_1_1: model nn = read_model "nets/onnx/ACASXU_1_1.onnx"
+    constant nn_2_7: model nn = read_model "nets/onnx/ACASXU_2_7.onnx"
    predicate valid_input (i: vector t) =
      let rho = i[0] in
@@ -362,13 +361,13 @@ In the end, the WhyML file looks like this:
    goal P3_1_1:
      forall i: vector t.
        has_length i 5 -> valid_input i ->
-        let is_coc_min = is_min (nn_1_1@@i) 0 in
+        let is_coc_min = is_min (nn_1_1 @@ i) 0 in
        not is_coc_min
    goal P3_2_7:
      forall i: vector t.
        has_length i 5 -> valid_input i ->
-        let is_coc_min = is_min (nn_2_7@@i) 0 in
+        let is_coc_min = is_min (nn_2_7 @@ i) 0 in
        not is_coc_min
  end
@@ -381,13 +380,13 @@ We can then verify the resulting verification problem as before:
 .. code-block:: console
-    $ caisar verify --prover Marabou examples/acasxu/property_3.why -t 10m
+    $ caisar verify --prover Marabou property_3.why -t 10m
    [caisar] Goal P3_1_1: Timeout
    [caisar] Goal P3_2_7: Valid
 .. code-block:: console
-    $ caisar verify --prover nnenum examples/acasxu/property_3.why -t 10m
+    $ caisar verify --prover nnenum property_3.why -t 10m
    [caisar] Goal P3_1_1: Valid
    [caisar] Goal P3_2_7: Valid

--- a/doc/interpretation.rst
+++ b/doc/interpretation.rst
@@ -6,50 +6,47 @@ The CAISAR modelling language
 Origin: WhyML
 -------------
-CAISAR heavily relies on Why3 and uses the WhyML language as a basis for
+CAISAR heavily relies on Why3 and uses the WhyML language as a basis for its own
-its own interpretation language.
+interpretation language. A reference of WhyML is available on the original `Why3
-A reference of WhyML is available on the original `Why3
 manual <https://why3.lri.fr/doc/syntaxref.html>`_.
 However, since Why3 aims to verify a whole range of programs, it cannot
-specialize on a particular program structure.
+specialize on a particular program structure. For further background, the paper
-For further background, the paper [F2013]_ from one of Why3's
+[F2013]_ from one of Why3's original author details the rationale and tradeoff
-original author details the rationale and tradeoff involved
+involved in the WhyML design and implementation. To quote the author, "[the Why3
-in the WhyML design and implementation. To quote the author, "[the Why3 team] has come up with logic of compromise".
+team] has come up with logic of compromise".
-As CAISAR focuses on artificial intelligence systems, it can
+As CAISAR focuses on artificial intelligence systems, it can make some
-make some additional assumptions on the nature of the
+additional assumptions on the nature of the inputs, program and users:
-inputs, program and users:
-* users will have a basic background on linear algebra, and
+* users will have a basic background on linear algebra, and expect CAISAR to
-  expect CAISAR to reflect this
+  reflect this
-* inputs will mostly be multidimensional vectors (machine
+* inputs will mostly be multidimensional vectors (machine learning community
-  learning community coined the term "tensor" as well) of
+  coined the term "tensor" as well) of floating point values, strings, ints or
-  floating point values, strings, ints or chars
+  chars
-* the program control flow will mostly be composed of a lot of
+* the program control flow will mostly be composed of a lot of real or
-  real or floating-point arithmetic operations: there is no
+  floating-point arithmetic operations: there is no loop with runtime invariant,
-  loop with runtime invariant, nor conditional
+  nor conditional
-With those constraints in mind, CAISAR provides several
+With those constraints in mind, CAISAR provides several extensions to WhyML,
-extensions to WhyML, that we detail here. They can be used
+that we detail here. They can be used directly in any WhyML file provided to
-directly in any WhyML file provided to CAISAR.
+CAISAR.
-Some of those extensions will be "interpreted".
+Some of those extensions will be "interpreted". During the translation from
-During the translation from "pure" WhyML terms to actual inputs to provers,
+"pure" WhyML terms to actual inputs to provers, symbols will be replaced with
-symbols will be replaced with other symbols, or directly computed by CAISAR.
+other symbols, or directly computed by CAISAR.
 Built-ins
 ---------
 .. index:: Interpretation; interpretation
-The built-ins are available under ``stdlib/interpretation.mlw``.
+The built-ins are available under ``stdlib/caisar/``. To access the symbols they
-To access the symbols they define, the corresponding theory needs to be imported
+define, the corresponding theory needs to be imported in the scope of the
-in the scope of the current theory. For instance, to import
+current one. For instance, to import the symbols defined by the theory
-the symbols defined by the theory ``Vector``, prepend ``use caisar.caisar.Vector`` at
+``Vector``, prepend ``use caisar.types.Vector`` at the top of the file.
-the top of the file.
 Vector
 ******
@@ -94,31 +91,30 @@ Predicates
    length(v1) = length(v2) ->
      forall i: index. valid_index v1 i -> f v1[i] v2[i]
-NeuralNetwork
+NN
-*************
+***
 Types
 ~~~~~
 .. code-block:: whyml
-  (** Type describing a neural network.
+  (** Type of a neural network.
  CAISAR is able to interpret a symbol of type nn
  to get its control flow graph. **)
  type nn
-  type format = ONNX | NNet
 Functions
 ~~~~~~~~~
 .. code-block:: whyml
-  (** Returns a symbol of type nn from the neural network located at n. **)
+  (** Returns a symbol for the neural network with filename n. **)
-  function read_model (n: string) (f: format) : nn
+  function read_model (n: string) : model nn
-  (** Returns a vector that represents the application of neural network nn
+  (** Returns a vector that represents the application of neural network
-  on input vector v.**)
+  on an input vector. **)
-  function (@@) (n: nn) (v: vector 'a) : vector 'a
+  function (@@) (n: model nn) (v: vector 'a) : vector 'a
 Dataset
 *******
@@ -130,18 +126,18 @@ Types
  (** Dataset type. dataset 'a 'b is a dataset with
    inputs of type 'a vector and output of type 'b vector. **)
-  type dataset 'a 'b = vector ('a, 'b)
+  type a
-  type format = CSV
+  type b
+  type dataset = vector (a, b)
 Functions
 ~~~~~~~~~
 .. code-block:: whyml
-  (** Returns a symbol of type nn from the neural network located at n. **)
+  (** Returns a symbol for the dataset with filename f. **)
-  function read_dataset (f: string) : dataset 'a 'b
+  function read_dataset (f: string) : dataset
-  function foreach
+  function foreach (d: dataset) (f: a -> b -> 'c) : vector 'c =
-    (d: dataset 'a 'b) (f: 'a -> 'b -> 'c) : vector 'c =
      Vector.foreach d (fun e -> let a, b = e in f a b)
 Predicates
@@ -149,8 +145,7 @@ Predicates
 .. code-block:: whyml
-  predicate forall_
+  predicate forall_ (d: dataset) (f: a -> b -> bool) =
-    (d: dataset 'a 'b) (f: 'a -> 'b -> bool) =
      Vector.forall_ d (fun e -> let a, b = e in f a b)

--- a/doc/mnist.rst
+++ b/doc/mnist.rst
@@ -15,8 +15,7 @@ it classify with a same label all other elements being at an
 :math:`l_\infty`-distance of at most :math:`\epsilon \geq 0` from it. More in
 general, a neural network is deemed (locally) robust on a dataset whenever the
 former property is valid on all the dataset elements. The CAISAR standard
-library specifies such a property in terms of the predicate ``robust``, which
+library specifies such a property in terms of the predicate ``robust``.
-CAISAR implements as a builtin.
 In the following, we will describe how to use CAISAR for verifying a neural
 network robust on (a fragment of) the MNIST dataset.
@@ -33,8 +32,7 @@ valuable for assessing robustness properties by means of formal method tools.
 CAISAR provides in `mnist_test.csv
 <https://git.frama-c.com/pub/caisar/-/blob/master/examples/mnist/csv/mnist_test.csv>`_
 a fragment (:math:`100` images) of the MNIST dataset under the
-``examples/mnist/csv`` folder.
+``examples/mnist/csv`` folder. Each line in this file represents an MNIST image:
-Each line in this file represents an MNIST image:
 in particular, the first column represents the classification label, and the
 remaining :math:`784` columns represent the greyscale value of the respective
 pixels, rescaled into :math:`[0, 1]`.
@@ -60,8 +58,8 @@ Since we actually deal with a *dataset* of finite elements, we will instead
 verify a slightly different property: given a classifier :math:`C`, an element
 :math:`x \in X`, and some perturbation :math:`\epsilon \geq 0`, it must hold
 that :math:`\forall x'. \ \lVert x - x' \rVert_\infty \leq \epsilon \Rightarrow
-C(x) = C(x')`. Obviously, such a property must be verified for all elements of a
+C(x) = C(x') = y`, where `y` is the expected dataset classification for `x`.
-dataset.
+Obviously, such a property must be verified for all elements of a dataset.
 Modelling the problem using WhyML
 =================================
@@ -69,33 +67,33 @@ Modelling the problem using WhyML
 As described for the example on :ref:`acas_xu`, we first need to write a
 specification file containing a WhyML theory to describe the verification
 problem. In principle, we need to formalize the local robustness property as
-well as the notions of classifier and dataset.
+well as the notions of classifier and dataset. The CAISAR `standard library
-The CAISAR interpretation language `caisar.mlw
+<https://git.frama-c.com/pub/caisar/-/blob/master/stdlib/caisar/>`_ provides
-<https://git.frama-c.com/pub/caisar/-/blob/master/stdlib/interpretation.mlw>`_ provides theories that defines those concepts.
+theories that defines those concepts. We will import the relevant theories with
-We will import the relevant theories with the ``use`` keyword.
+the ``use`` keyword. As described in :ref:`interpretation`, the ``Vector``
-As described in :ref:`interpretation`, the ``Vector`` theory provides
+theory provides a vector type, a getter (``[]``) operation and a ``valid_index``
-a vector type, a getter (``[]``) operation and a ``valid_index`` predicate
+predicate that determines whether the get operation is within the range of the
-that determines whether the get operation is within the range of the vector length.
+vector length. ``NN`` defines a type and an application function (``@@``). We
-``NeuralNetwork`` defines a type and an application function (``@@``).
+will also need integers and floating point numbers to declare and define
-We will also need integers and floating point numbers
+:math:`\epsilon`.
-to declare and define :math:`\epsilon`. 
 .. code-block:: whyml
   use ieee_float.Float64
   use int.Int
-   use interpretation.Vector
+   use caisar.types.Vector
-   use interpretation.NeuralNetwork
+   use caisar.model.Model
-   use interpretation.Dataset
+   use caisar.model.NN
+   use caisar.dataset.Dataset
   type image = vector t
   type label_ = int
-We will first write some predicates to take into account the fact
+We will first write some predicates to take into account the fact that MNIST
-that MNIST counts 10 labels (integer from 0 to 9) in the dataset sample,
+counts 10 labels (integer from 0 to 9) in the dataset sample, and that the input
-and that the
+images are normalized (floating point values between 0. and 1.). We will also
-input images are normalized (floating point values between 0. and 1.).
+define a predicate that, given a label ``l`` and an image ``i``, checks whether
-We will also define a predicate that, given a label ``l`` and an image ``i``, checks whether the neural network ``nn`` indeed advises the correct label.
+the neural network ``nn`` indeed advises the correct label.
 .. code-block:: whyml
@@ -104,7 +102,7 @@ We will also define a predicate that, given a label ``l`` and an image ``i``, ch
   predicate valid_label (l: label_) = 0 <= l <= 9
-   predicate advises (n: nn) (i: image) (l: label_) =
+   predicate advises (n: model nn) (i: image) (l: label_) =
     valid_label l ->
     forall j: int. valid_label j ->  j <> l -> (n@@i)[l] .> (n@@i)[j]
@@ -118,7 +116,7 @@ predicate:
 We can now define the property to check that is a straightforward
-description of property_
+description of property_:
 .. code-block:: whyml
@@ -134,34 +132,32 @@ description of property_
     Dataset.forall_ d (robust_around n eps)
 Finally, to instanciate this property on concrete neural networks and data
-samples, we can define a goal and check the property
+samples, we can define a goal and check the following property:
 .. code-block:: whyml
  goal robustness:
-    let nn = read_model "nets/MNIST_256_2.onnx" ONNX in
+    let nn = read_model "nets/MNIST_256_2.onnx" in
    let dataset = read_dataset "csv/mnist_test.csv" in
    let eps = (0.0100000000000000002081668171172168513294309377670288085937500000:t) in
    robust nn dataset eps
-The final property file is available, as is, under the ``/examples/mnist/`` folder as
+A more general formulation of this property is available in `mnist.why
-`property.why
+<https://git.frama-c.com/pub/caisar/-/blob/master/examples/mnist/mnist.why>`_.
-<https://git.frama-c.com/pub/caisar/-/blob/master/examples/mnist/property.why>`_.
+The corresponding neural network in ONNX format is available under the folder
-The corresponding neural network in ONNX format is available under the
+`/examples/mnist/nets/
-``/examples/mnist/nets/`` folder as `MNIST_256_2.onnx
+<https://git.frama-c.com/pub/caisar/-/blob/master/examples/mnist/nets/>`_.
-<https://git.frama-c.com/pub/caisar/-/blob/master/examples/mnist/nets/MNIST_256_2.onnx>`_.
 Verifying the property with CAISAR
 ==================================
-Now we may verify whether the previous robustness specification holds on the
+Now we may verify whether the previous robustness specification holds by using
-by means of the nnenum prover. This can be
+the nnenum prover. This can be done via CAISAR as follows:
-done via CAISAR as follows:
 .. code-block:: console
-   $ caisar verify --prover nnenum -L examples/mnist/ --format whyml examples/mnist/property.why
+   $ caisar verify -p nnenum property.why
   [caisar] Goal robustness: Invalid
 The result tells us that there exists at least one image in ``mnist_test.csv``