2022-03-28-machine-learning-for-improving-formal-verification-of-code.md

---
layout: job
title: PhD Position at CEA LIST - LSL
short_title: PhD Position
short: Machine Learning for Improving Formal Verification of Code
date: 28-03-2022
filled: true
keywords: machine learning, graph neural networks, code representation learning, formal methods
---

# {{ page.short }}

[Full description](/download/jobs/2022-03-28-machine-learning-for-improving-formal-verification-of-code.pdf)

#### Context: CEA LIST, Software Security and Reliability Lab

[CEA List](http://www-list.cea.fr/en/)'s offices are located at the heart of
Université Paris-Saclay, the largest European cluster of public and private
research. Within [CEA List](http://www-list.cea.fr/en/), the Software Safety and
Security Lab has an ambitious goal: help designers, developers and validation
experts ship high-confidence systems and software.

Systems in our surroundings are getting more and more complex, and we have built
a reputation for efficiently using formal reasoning to demonstrate their
trustworthiness through the design of methods and tools to ensure that
real-world systems can comply with the highest safety and security standards. In
doing so, we get to interact with the most creative people in academia and the
industry, worldwide.

Our organizational structure is simple: those who pioneer new concepts are the
ones who get to implement them. We are a fifty-person team, and your work will
have a direct and visible impact on the state of formal verification.

#### Work Description

Formal verification consists in providing mathematical guarantees on the
conformity of a program's behavior with respect to certain property
specifications. In particular, static program verification aims at determining
the properties of a program for all possible concrete inputs, without executing
the program itself. A notable example is static verification of the absence of
errors at runtime.

Our team develops [Frama-C](https://www.frama-c.com), a code analysis platform
for C programs which provides several analyzers as plug-ins.
[Frama-C](https://www.frama-c.com) allows the user to annotate a C program with
formal specifications, written in the
[ACSL](https://www.frama-c.com/html/acsl.html) specification language, and then
ensure it satisfies its formal specification by relying on several static
verification techniques including abstract interpretation, provided by the
plug-in [Eva](https://www.frama-c.com/fc-plugins/eva.html), and the weakest
preconditions calculus, provided by the plug-in
[WP](https://www.frama-c.com/fc-plugins/wp.html).

Both plug-ins provide highly parametrizable techniques that may be efficiently
combined, but their activation may be costly in terms of resources such as time
of computation and memory footprint. It requires a thorough knowledge of
[Frama-C](https://www.frama-c.com) and its analysis techniques to choose wisely
which to use in which cases. Moreover, many of these techniques are more or less
based on heuristics which are usually manually conceived. These heuristics are
often suboptimal, fragile, and require considerable technical knowledge and
effort to be devised. As such, they do not generalize or scale well to different
code bases, and need to be updated over time.

Machine learning has been recently used to perform several tasks on code, and
seems particularly adapted to help in overcoming the aforementioned obstacles.
On the one hand, it allows to treat the difficulty no longer as a problem of
understanding the analysis tool but as a problem of understanding the forms of
code adapted to the different techniques. On the other hand, the automatic
learning of strategies can be repeated as the requirements and the tool evolve.
These improvements can be achieved at a much finer level than a human can
accomplish in a reasonable amount of time.

The ultimate goal of the PhD is to integrate machine learning approaches to the
[Frama-C](https://www.frama-c.com) static analyzers in order to improve their
usability and scalability. The PhD will start by studying which heuristics
already in place in [Eva](https://www.frama-c.com/fc-plugins/eva.html) or
[WP](https://www.frama-c.com/fc-plugins/wp.html) could be automatically learned.
Later, the PhD will investigate the best representations and learning algorithms
for treating code, with a particular focus on maintaining as much as possible
the semantic elements. A special interest will be devoted to graph embeddings
and graph neural networks which seem the best approaches for the task.

The PhD student will design and implement a machine learning pipeline for code
verification strategies. Furthermore, the PhD student will integrate such
strategies into [Frama-C](https://www.frama-c.com) and evaluate their
performances with respect to those already in place today.

#### Application

Knowledge in the following fields is required:

- Machine learning or deep learning
- Python programming

Knowledge in the following fields is welcome:

- OCaml programming (or another functional programming language)
- Deep learning frameworks (TensorFlow or PyTorch)
- Program verification (abstract interpretation, deductive verification, etc.)
- C programming
- Formal semantics of programming languages

**Salary:** Competitive PhD salary.

**Availability:** September 2022. However, since a 3+ month procedure for
administrative and security purposes is required, we suggest to get in touch
with us as soon as possible.

**Contact:**

- [Michele Alberti](mailto:julien.signoles@cea.fr)
- [Valentin Perrelle](mailto:valentin.perrelle@cea.fr)

Please join a detailed CV, and a motivation letter.