--- layout: fc_discuss_archives title: Message 71 from Frama-C-discuss on December 2009 ---
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Frama-c-discuss] how padding are introduced by Frama-C



Hi Pascal,

Thanks a lot for your answers (the short one and the long one).
I didn't know the possibility of evaluating an expression in a chosen 
statement. It's great.
And I didn't know also that padding were considered as uninitialized 
segment in libentry mode.

You heard me asking
> "How would one go about forcing the analyzer to be
> more precise on this program, and reach the desired
> conclusion that function f1 is safe from
> padding-related undefinedness?", I hear you asking.
>   
but a this time, I was really sleeping!

My point of view, is that for critical development, we should explicitly 
declare padding fields.
For this exemple, we should declare :
typedef struct {
    char c;
    char padding1 ;
    T_TAB t
}

So, in this case, the false alarm is a good information for a lack of 
explicit padding (we just have to be aware of this).

thank you again,

St?phane


CUOQ Pascal a ?crit :
>> Here is an exemple where padding has
>> a consequence on warnings.
>>     
>
> Nice find. I nominate this question for the FAQ
> in the next version of the value analysis' documentation.
>
>   
>> Could someone explain me what happens.
>>     
>
> You were using a command-line such as:
>
> frama-c -val t.c -lib-entry -main f1
>
> Well, the short answer to your question first:
>
> In -lib-entry mode, the value analysis assumes
> padding bytes in structures to be uninitialized.
> This way, any memory access that is sure to
> read from a padding area is detected, warned about,
> and excluded from further propagation,
> while a memory access that may or may not read
> from a padding area is detected, warned about
> and the propagation continues with
> the only values that make sense (that values that do *not*
> come from the padding area).
>
> If the warning in this latter case is a false alarm,
> then there is no loss of precision for the rest of the analysis.
> It could be argued that we should use a more explicit value
> than "Uninitialized", perhaps "Padding".
> Yes, that would be clearer. I am making a note of this.
>
> Okay, here really comes the short answer now :
>
> the array of structs tV1 does not contain any padding bytes :
>
>         tV1[0..9] IN [--..--]
>
> the array of structs tV2 contains some padding bytes. 
>
>         tV2[0].c IN [--..--]
>            [0].[bits 8 to 15] IN UNINITIALIZED
>            {[0].t[0..1023]; [1].c; } IN [--..--]
>            [1].[bits 8 to 15] IN UNINITIALIZED
>            {[1].t[0..1023]; [2].c; } IN [--..--]
>  ... you get the idea ...
>            [9].t[0..1023] IN [--..--]
>
> Ergo, any dumb analyzer could guarantee
> that memory accesses within tV1 do not read from
> padding bytes, whereas it is less obvious that a memory
> access inside tV2 is not an undefined access to
> padding bytes.
>
> Long version: 
>
> launch the GUI with
>
> frama-c-gui -val t.c -lib-entry -main f1
>
> Select the statement in the first (resp. second) loop
> and right-click, in the contextual menu select
> "evaluate expression", and enter "(char *)(tV1[id].t) + i"
> (resp. "(char *)(tV2[id].t) + i") in the dialog.
>
> You get respectively:
> Before the selected statement, all the values taken by the expression (char *)(tV1[id].t) + i are contained in {‌{ &tV1 + [4..20519] ;}‌}
> Before the selected statement, all the values taken by the expression (char *)(tV2[id].t) + i are contained in {‌{ &tV2 + [2..20499] ;}‌}
>
> These offsets are expressed in bytes.
> In the first line, the offset is at least 4 because
> tV1[0].i is a 4-byte int followed immediately by tV1[0].t[0].
> In the second line, tV2[0].i is a char, tV2[0].t[0]
> needs to be 16-bit aligned and is therefore preceded by
> one byte of padding, whence the lower bound of 2 for
> the offset at which this memory access can take place.
>
> The upper bounds for the offsets are slightly different
> because the sizes of the structures are slightly different.
> tV1 contains slightly more bytes than tV2.
>
> The important thing to notice in these offsets
> is what is not there: a congruence information that would
> ensure that the program is not accessing any random
> byte in tV1 or tV2. Indeed, this information cannot be 
> represented in the value analysis' format r%m.
> Consecutive bytes are accessed in tV1[id].t or
> tV2[id].t, so there is no interesting remainder and modulo that
> characterize the stride of the memory access here.
> Therefore the analyzer is not able to infer that accessing
> inside tV2 does not read from the padding (which
> is undefined), and it is only able to reach this conclusion
> for tV1 because there are no padding bytes there at all!
>
> But there is more!
>
> "How would one go about forcing the analyzer to be
> more precise on this program, and reach the desired
> conclusion that function f1 is safe from
> padding-related undefinedness?", I hear you asking.
>
> There are two ways to reason about this:
>
> The first line of reasoning, which I recommend, is that there is
> one big button to change the trade-off between precision and
> resource consumption in the Value Analysis, and this button is
> called "-slevel". Therefore one should use this option before even
> starting to think about the problem.
>
> The second possible line of reasoning is that
> the imprecision comes from the fact that
> the analysis tries to handle all iterations in each loop
> simultaneously, so one should look for a way to make
> the analyzer treat each iteration separately if possible.
> This result can be achieved with the option "-slevel".
> Each loop takes 2048 iterations (the loops iterate on
> the bytes of arrays of 1024 shorts), and we don't want
> to have to run the analysis again because we were
> a little short, so we allow 2050 iterations to be
> analyzed separately before starting to lose precision.
>
> Manzana:~ pascal$ time frama-c -slevel 2050 -val t.c -lib-entry -main f1 >L 2>&1
> real   0m18.997s
> Manzana:~ pascal$ grep t\\.c:32 L
> Manzana:~ pascal$ 
>
> Note that unrolling each loop does not make the abstract value
> for expression (char *)(tV2[id].t) + i a singleton.
> There are still different values for this expression,
> because variable id is not precisely known.
>
> You cannot observe the values used to analyze each iteration
> with -slevel, but you can try -ulevel instead, which
> may or may  not produce an AST too big for analysis or
> for displaying in the GUI. Or you can use one of the provided
> built-in for printing abstract values during the analysis:
>
>   /* loop 2 */
>   for(i=0; i<sizeof(T_TAB); i++)
>     {
>       Frama_C_show_each_address((char *)(tV2[id].t) + i);
>       ((char*)v2.t)[i] = ((char*)tV2[id].t)[i] ;
>     }
>
> Using the same command-line again produces in addition:
>
> [value] Called Frama_C_show_each_address({‌{ &tV2 + [2..18452],2%2050 ;}‌})
> [value] Called Frama_C_show_each_address({‌{ &tV2 + [3..18453],3%2050 ;}‌})
> [value] Called Frama_C_show_each_address({‌{ &tV2 + [4..18454],4%2050 ;}‌})
> [value] Called Frama_C_show_each_address({‌{ &tV2 + [5..18455],5%2050 ;}‌})
> ... you get the idea ...
> [value] Called Frama_C_show_each_address({‌{ &tV2 + [2049..20499],2049%2050 ;}‌})
>
> Each of the abstract values for the address 
> (char *)(tV2[id].t) + i 
> is not, because of the indeterminacy on id, a singleton.
> Still, the indeterminacy that remains can be captured precisely
> by the value analysis' r%m representation. At each iteration
> it is able to capture the fact we are accessing inside a field t
> (and therefore not inside padding), but in a different way
> for each iteration.
>
> Pascal
> __
> If I had a blog, this would probably had been the second post there.
>