2011-11-05-What-functions-does-a-function-use-option--users.html

---
layout: post
author: Pascal Cuoq
date: 2011-11-05 11:53 +0200
categories: derived-analysis
format: xhtml
title: "What functions does a function use: option -users"
summary: 
---
{% raw %}
<h2>Exploring unfamiliar code</h2> 
<p>Sometimes, one finds oneself in the situation of exploring unfamiliar code. In these circumstances, it is sometimes useful to know which functions a function <code>f()</code> uses. This sounds like something that can be computed from the callgraph, and there exists plenty of tools out there that can extract a callgraph from a C program, but the callgraph approach has several drawbacks:</p> 
<ol> 
<li>A static callgraph does not include calls made through function pointers. Therefore, you do not see all the functions that <code>f()</code> uses: the list omits the functions that were directly or indirectly called through a function pointer.</li> 
<li>The set of functions computed from the call graph is over-approximated, because if <code>f()</code> calls <code>g()</code> and <code>g()</code> may sometimes call <code>h()</code>, it doesn't necessarily mean that <code>f()</code> uses <code>h()</code>. Indeed, perhaps <code>g()</code> never calls <code>h()</code> when it is called from <code>f()</code>, but only when it is called from another function <code>k()</code>.</li> 
</ol> 
<h2>Example</h2> 
<p>Here is an example that illustrates both issues.</p> 
<pre>enum op { ADD, MULT }; 
void copy_int(int *src, int *dst) 
{ 
  *dst = *src; 
} 
int really_add(int u, int v) 
{ 
  return u + v; 
} 
int really_mult(int u, int v) 
{ 
  return u * v; 
} 
int do_op(enum op op, int u, int v) 
{ 
  if (op == ADD) 
    return really_add(u, v); 
  else if (op == MULT) 
    return really_mult(u, v); 
  else  
    return -1; 
} 
int add(int x, int y) 
{ 
  int a, b, res; 
  void (*fun_ptr)(int*, int*); 
  fun_ptr = copy_int; 
  (*fun_ptr)(&amp;x, &amp;a); 
  (*fun_ptr)(&amp;y, &amp;b); 
  res = do_op(ADD, a, b); 
  return res; 
} 
</pre> 
<p>Using a syntactic callgraph to compute the functions used by <code>add()</code>, one finds <code>do_op()</code>, <code>really_add()</code>, and <code>really_mult()</code>. This list is over-approximated because <code>add()</code> does not really use <code>really_mult()</code>. 
More importantly, the list omits function <code>copy_int()</code>, which <strong>is</strong> used by <code>add()</code>.</p> 
<h2>Frama-C's users analysis</h2> 
<p>Frama-C's users analysis computes this list instead:</p> 
<pre>$ frama-c -users -lib-entry -main add example.c 
... 
[users] ====== DISPLAYING USERS ====== 
        do_op: really_add  
        add: copy_int really_add do_op  
        ====== END OF USERS ========== 
</pre> 
<p>The users analysis exploits the results of the value analysis, so the results hold for the initial conditions the value analysis was configured for. Here, the value analysis was instructed to study the function <code>add()</code> by itself. In these conditions, <code>do_op()</code> only calls <code>really_add()</code>, but if the analysis focused on a larger program it would see that <code>do_op()</code> also sometimes call <code>really_mult()</code>. The users analysis can tell that <code>add()</code> uses <code>copy_int()</code>, <code>really_add()</code>, and <code>do_op()</code>, and does not use <code>really_mult()</code>.</p> 
<p>This kind of synthetic information is very useful when trying to get a grip on large programs, for instance, when trying to extract a useful function from a large codebase to make it a library. Unsurprisingly, plenty of tools already existed before Frama-C that tried to provide this sort of information. But having information on the dynamic behavior of the program can make a large difference in the value of the synthetic information computed.</p> 
<p>My colleagues at Airbus Opérations SAS and Atos SA will present serious applications of the <code>-users</code> option at <a href="http://www.erts2012.org/">ERTS² 2012</a> (next February).</p>
 <h2>Exploring unfamiliar code</h2> 
<p>Sometimes, one finds oneself in the situation of exploring unfamiliar code. In these circumstances, it is sometimes useful to know which functions a function <code>f()</code> uses. This sounds like something that can be computed from the callgraph, and there exists plenty of tools out there that can extract a callgraph from a C program, but the callgraph approach has several drawbacks:</p> 
<ol> 
<li>A static callgraph does not include calls made through function pointers. Therefore, you do not see all the functions that <code>f()</code> uses: the list omits the functions that were directly or indirectly called through a function pointer.</li> 
<li>The set of functions computed from the call graph is over-approximated, because if <code>f()</code> calls <code>g()</code> and <code>g()</code> may sometimes call <code>h()</code>, it doesn't necessarily mean that <code>f()</code> uses <code>h()</code>. Indeed, perhaps <code>g()</code> never calls <code>h()</code> when it is called from <code>f()</code>, but only when it is called from another function <code>k()</code>.</li> 
</ol> 
<h2>Example</h2> 
<p>Here is an example that illustrates both issues.</p> 
<pre>enum op { ADD, MULT }; 
void copy_int(int *src, int *dst) 
{ 
  *dst = *src; 
} 
int really_add(int u, int v) 
{ 
  return u + v; 
} 
int really_mult(int u, int v) 
{ 
  return u * v; 
} 
int do_op(enum op op, int u, int v) 
{ 
  if (op == ADD) 
    return really_add(u, v); 
  else if (op == MULT) 
    return really_mult(u, v); 
  else  
    return -1; 
} 
int add(int x, int y) 
{ 
  int a, b, res; 
  void (*fun_ptr)(int*, int*); 
  fun_ptr = copy_int; 
  (*fun_ptr)(&amp;x, &amp;a); 
  (*fun_ptr)(&amp;y, &amp;b); 
  res = do_op(ADD, a, b); 
  return res; 
} 
</pre> 
<p>Using a syntactic callgraph to compute the functions used by <code>add()</code>, one finds <code>do_op()</code>, <code>really_add()</code>, and <code>really_mult()</code>. This list is over-approximated because <code>add()</code> does not really use <code>really_mult()</code>. 
More importantly, the list omits function <code>copy_int()</code>, which <strong>is</strong> used by <code>add()</code>.</p> 
<h2>Frama-C's users analysis</h2> 
<p>Frama-C's users analysis computes this list instead:</p> 
<pre>$ frama-c -users -lib-entry -main add example.c 
... 
[users] ====== DISPLAYING USERS ====== 
        do_op: really_add  
        add: copy_int really_add do_op  
        ====== END OF USERS ========== 
</pre> 
<p>The users analysis exploits the results of the value analysis, so the results hold for the initial conditions the value analysis was configured for. Here, the value analysis was instructed to study the function <code>add()</code> by itself. In these conditions, <code>do_op()</code> only calls <code>really_add()</code>, but if the analysis focused on a larger program it would see that <code>do_op()</code> also sometimes call <code>really_mult()</code>. The users analysis can tell that <code>add()</code> uses <code>copy_int()</code>, <code>really_add()</code>, and <code>do_op()</code>, and does not use <code>really_mult()</code>.</p> 
<p>This kind of synthetic information is very useful when trying to get a grip on large programs, for instance, when trying to extract a useful function from a large codebase to make it a library. Unsurprisingly, plenty of tools already existed before Frama-C that tried to provide this sort of information. But having information on the dynamic behavior of the program can make a large difference in the value of the synthetic information computed.</p> 
<p>My colleagues at Airbus Opérations SAS and Atos SA will present serious applications of the <code>-users</code> option at <a href="http://www.erts2012.org/">ERTS² 2012</a> (next February).</p>
{% endraw %}