2011-08-26-The-OCaml-compiler-does-have-some-nice-optimizations.html

---
layout: post
author: Pascal Cuoq
date: 2011-08-26 12:48 +0200
categories: OCaml value
format: xhtml
title: "The OCaml compiler does have some nice optimizations"
summary: 
---
{% raw %}
<p>Many OCaml programmers use it because it offers a reasonable (to them) compromise between expressivity and control over resources use. Serious OCaml users are often heard complaining about relatively simple optimizations that the compiler does not have. But this reveals as much of the kind of programmers that end up using OCaml as it does of OCaml itself.</p> 
<p>Case in point: the value analysis in Boron used to have a type similar to:</p> 
<pre> type t = bool * bool * v 
</pre> 
<p>If this was C, the definition above would be a <code>struct</code>. Here, <code>bool</code> is the predefined type that you expect, <code>v</code> is the type of values a C expression can have in the value analysis, and the type <code>t</code> being defined is the type of values as found in memory. A value in memory can be a normal value (the <code>v</code> part) or it may have reason to be uninitialized (the first <code>bool</code>) or it may be a dangling pointer (the second <code>bool</code>). During analysis, a value in memory can be all of the above at once, as is variable <code>y</code> after the following snippet:</p> 
<pre> void f(void) 
 { 
   int* y; 
   if (unknown condition 1) 
     y = NULL; 
   else if (unknown condition 2) 
   { 
     int l; 
     y = &amp;l; 
   } 
   ... 
 } 
</pre> 
<p>After the first instructions of function <code>f</code>, variable <code>y</code> is either uninitialized or a dangling pointer or the normal value <code>NULL</code>. The value analysis \uninitialized" flag can also be seen in action in <a href="/index.php?post/2011/08/12/Putnum">this post</a>.</p> 
<p>An OCaml value of the above type <code>t</code> occupies four words in memory: one header word for automatic memory management  and one for each field. Each field occupies one word to make memory management  genericity  etc. easier.</p> 
<p>In the period between Boron and Carbon  I decided to make the representation of type <code>t</code> a bit more compact (this is one of several memory use improvements I boasted about in this blog). The two booleans can be represented as bits inside a same integer; this would save one word. I decided to go further and  in effect  to borrow two unused bits from the header word by using the definition below:</p> 
<pre> type t = 
     C_uninit_escaping of v 
   | C_uninit_noescaping of v 
   | C_init_escaping of v 
   | C_init_noescaping of v 
</pre> 
<p>This type definition is no longer like a C <code>struct</code>. It is more like an union  an union of <code>v</code> or <code>v</code> or <code>v</code> or <code>v</code>  but a <a href="http://en.wikipedia.org/wiki/Tagged_union">discriminated union</a> where it's possible to recognize what kind of <code>v</code> one is talking about. C forces the programmer to manage <a href="http://en.wikipedia.org/wiki/Spivak_pronoun">eir</a> own tag. In OCaml  some bits are reserved in the bloc header for the tag. A value of the new type <code>t</code> always occupies two words  one word for the header (including some bits for the tag)  and one word for the contents of type <code>v</code>.</p> 
<p>During analysis  when the C program gets a value from memory and uses it in a computation  it is necessary to get the part of type <code>v</code> from a value of type <code>t</code>. With the old definition  the function was:</p> 
<pre> let get_v (_  _  v) = v 
</pre> 
<p>With the new definition  the same function <code>get_v</code> would be:</p> 
<pre> let get_v =  
   function  
      C_uninit_escaping  v 
    | C_uninit_noescaping v 
    | C_init_escaping v 
    | C_init_noescaping v -&gt; v 
</pre> 
<p>I was afraid that this would be compiled into an ugly 4-case test  so I used a different implementation  based on low-level OCaml primitives that are not supposed to be used:</p> 
<pre> let get_v : t -&gt; v = fun x -&gt; Obj.obj (Obj.field (Obj.repr x) 0) 
</pre> 
<p>It is ugly too  but my hope was that the compiled code would be better. It literally means "forget the type of the argument <code>x</code> (something you are never supposed to do in OCaml) and get the first field of that  whatever that is". 
I was so used to the compiler not optimizing to my liking that I did not even compare the code for the two versions. 
I should have known better. Julien Signoles discovered that the compiler generates the following code for the first  pure OCaml version:</p> 
<pre>        movq    (%rax)  %rax 
        ret 
</pre> 
<p>The above is "AT&amp;T" assembly syntax  and corresponds to <code>return *arg;</code> in C. 
It's even better than the generated code for the second  low-level version because in the first version  the compiler has more type information to rely on. Plus  since I am reading the assembly for the whole module  I can confirm that the function gets inlined where it is called directly (another optimization that the OCaml compiler does very well).</p>
{% endraw %}