--- layout: job title: PhD Position at CEA LIST - LSL short_title: PhD Position short: Machine Learning for Improving Formal Verification of Code date: 28-03-2022 filled: true keywords: machine learning, graph neural networks, code representation learning, formal methods --- # {{ page.short }} [Full description](/download/jobs/2022-03-28-machine-learning-for-improving-formal-verification-of-code.pdf) #### Context: CEA LIST, Software Security and Reliability Lab [CEA List](http://www-list.cea.fr/en/)'s offices are located at the heart of Université Paris-Saclay, the largest European cluster of public and private research. Within [CEA List](http://www-list.cea.fr/en/), the Software Safety and Security Lab has an ambitious goal: help designers, developers and validation experts ship high-confidence systems and software. Systems in our surroundings are getting more and more complex, and we have built a reputation for efficiently using formal reasoning to demonstrate their trustworthiness through the design of methods and tools to ensure that real-world systems can comply with the highest safety and security standards. In doing so, we get to interact with the most creative people in academia and the industry, worldwide. Our organizational structure is simple: those who pioneer new concepts are the ones who get to implement them. We are a fifty-person team, and your work will have a direct and visible impact on the state of formal verification. #### Work Description Formal verification consists in providing mathematical guarantees on the conformity of a program's behavior with respect to certain property specifications. In particular, static program verification aims at determining the properties of a program for all possible concrete inputs, without executing the program itself. A notable example is static verification of the absence of errors at runtime. Our team develops [Frama-C](https://www.frama-c.com), a code analysis platform for C programs which provides several analyzers as plug-ins. [Frama-C](https://www.frama-c.com) allows the user to annotate a C program with formal specifications, written in the [ACSL](https://www.frama-c.com/html/acsl.html) specification language, and then ensure it satisfies its formal specification by relying on several static verification techniques including abstract interpretation, provided by the plug-in [Eva](https://www.frama-c.com/fc-plugins/eva.html), and the weakest preconditions calculus, provided by the plug-in [WP](https://www.frama-c.com/fc-plugins/wp.html). Both plug-ins provide highly parametrizable techniques that may be efficiently combined, but their activation may be costly in terms of resources such as time of computation and memory footprint. It requires a thorough knowledge of [Frama-C](https://www.frama-c.com) and its analysis techniques to choose wisely which to use in which cases. Moreover, many of these techniques are more or less based on heuristics which are usually manually conceived. These heuristics are often suboptimal, fragile, and require considerable technical knowledge and effort to be devised. As such, they do not generalize or scale well to different code bases, and need to be updated over time. Machine learning has been recently used to perform several tasks on code, and seems particularly adapted to help in overcoming the aforementioned obstacles. On the one hand, it allows to treat the difficulty no longer as a problem of understanding the analysis tool but as a problem of understanding the forms of code adapted to the different techniques. On the other hand, the automatic learning of strategies can be repeated as the requirements and the tool evolve. These improvements can be achieved at a much finer level than a human can accomplish in a reasonable amount of time. The ultimate goal of the PhD is to integrate machine learning approaches to the [Frama-C](https://www.frama-c.com) static analyzers in order to improve their usability and scalability. The PhD will start by studying which heuristics already in place in [Eva](https://www.frama-c.com/fc-plugins/eva.html) or [WP](https://www.frama-c.com/fc-plugins/wp.html) could be automatically learned. Later, the PhD will investigate the best representations and learning algorithms for treating code, with a particular focus on maintaining as much as possible the semantic elements. A special interest will be devoted to graph embeddings and graph neural networks which seem the best approaches for the task. The PhD student will design and implement a machine learning pipeline for code verification strategies. Furthermore, the PhD student will integrate such strategies into [Frama-C](https://www.frama-c.com) and evaluate their performances with respect to those already in place today. #### Application Knowledge in the following fields is required: - Machine learning or deep learning - Python programming Knowledge in the following fields is welcome: - OCaml programming (or another functional programming language) - Deep learning frameworks (TensorFlow or PyTorch) - Program verification (abstract interpretation, deductive verification, etc.) - C programming - Formal semantics of programming languages **Salary:** Competitive PhD salary. **Availability:** September 2022. However, since a 3+ month procedure for administrative and security purposes is required, we suggest to get in touch with us as soon as possible. **Contact:** - [Michele Alberti](mailto:julien.signoles@cea.fr) - [Valentin Perrelle](mailto:valentin.perrelle@cea.fr) Please join a detailed CV, and a motivation letter.