--- layout: fc_discuss_archives title: Message 71 from Frama-C-discuss on December 2009 ---
Hi Pascal, Thanks a lot for your answers (the short one and the long one). I didn't know the possibility of evaluating an expression in a chosen statement. It's great. And I didn't know also that padding were considered as uninitialized segment in libentry mode. You heard me asking > "How would one go about forcing the analyzer to be > more precise on this program, and reach the desired > conclusion that function f1 is safe from > padding-related undefinedness?", I hear you asking. > but a this time, I was really sleeping! My point of view, is that for critical development, we should explicitly declare padding fields. For this exemple, we should declare : typedef struct { char c; char padding1 ; T_TAB t } So, in this case, the false alarm is a good information for a lack of explicit padding (we just have to be aware of this). thank you again, St?phane CUOQ Pascal a ?crit : >> Here is an exemple where padding has >> a consequence on warnings. >> > > Nice find. I nominate this question for the FAQ > in the next version of the value analysis' documentation. > > >> Could someone explain me what happens. >> > > You were using a command-line such as: > > frama-c -val t.c -lib-entry -main f1 > > Well, the short answer to your question first: > > In -lib-entry mode, the value analysis assumes > padding bytes in structures to be uninitialized. > This way, any memory access that is sure to > read from a padding area is detected, warned about, > and excluded from further propagation, > while a memory access that may or may not read > from a padding area is detected, warned about > and the propagation continues with > the only values that make sense (that values that do *not* > come from the padding area). > > If the warning in this latter case is a false alarm, > then there is no loss of precision for the rest of the analysis. > It could be argued that we should use a more explicit value > than "Uninitialized", perhaps "Padding". > Yes, that would be clearer. I am making a note of this. > > Okay, here really comes the short answer now : > > the array of structs tV1 does not contain any padding bytes : > > tV1[0..9] IN [--..--] > > the array of structs tV2 contains some padding bytes. > > tV2[0].c IN [--..--] > [0].[bits 8 to 15] IN UNINITIALIZED > {[0].t[0..1023]; [1].c; } IN [--..--] > [1].[bits 8 to 15] IN UNINITIALIZED > {[1].t[0..1023]; [2].c; } IN [--..--] > ... you get the idea ... > [9].t[0..1023] IN [--..--] > > Ergo, any dumb analyzer could guarantee > that memory accesses within tV1 do not read from > padding bytes, whereas it is less obvious that a memory > access inside tV2 is not an undefined access to > padding bytes. > > Long version: > > launch the GUI with > > frama-c-gui -val t.c -lib-entry -main f1 > > Select the statement in the first (resp. second) loop > and right-click, in the contextual menu select > "evaluate expression", and enter "(char *)(tV1[id].t) + i" > (resp. "(char *)(tV2[id].t) + i") in the dialog. > > You get respectively: > Before the selected statement, all the values taken by the expression (char *)(tV1[id].t) + i are contained in {{ &tV1 + [4..20519] ;}} > Before the selected statement, all the values taken by the expression (char *)(tV2[id].t) + i are contained in {{ &tV2 + [2..20499] ;}} > > These offsets are expressed in bytes. > In the first line, the offset is at least 4 because > tV1[0].i is a 4-byte int followed immediately by tV1[0].t[0]. > In the second line, tV2[0].i is a char, tV2[0].t[0] > needs to be 16-bit aligned and is therefore preceded by > one byte of padding, whence the lower bound of 2 for > the offset at which this memory access can take place. > > The upper bounds for the offsets are slightly different > because the sizes of the structures are slightly different. > tV1 contains slightly more bytes than tV2. > > The important thing to notice in these offsets > is what is not there: a congruence information that would > ensure that the program is not accessing any random > byte in tV1 or tV2. Indeed, this information cannot be > represented in the value analysis' format r%m. > Consecutive bytes are accessed in tV1[id].t or > tV2[id].t, so there is no interesting remainder and modulo that > characterize the stride of the memory access here. > Therefore the analyzer is not able to infer that accessing > inside tV2 does not read from the padding (which > is undefined), and it is only able to reach this conclusion > for tV1 because there are no padding bytes there at all! > > But there is more! > > "How would one go about forcing the analyzer to be > more precise on this program, and reach the desired > conclusion that function f1 is safe from > padding-related undefinedness?", I hear you asking. > > There are two ways to reason about this: > > The first line of reasoning, which I recommend, is that there is > one big button to change the trade-off between precision and > resource consumption in the Value Analysis, and this button is > called "-slevel". Therefore one should use this option before even > starting to think about the problem. > > The second possible line of reasoning is that > the imprecision comes from the fact that > the analysis tries to handle all iterations in each loop > simultaneously, so one should look for a way to make > the analyzer treat each iteration separately if possible. > This result can be achieved with the option "-slevel". > Each loop takes 2048 iterations (the loops iterate on > the bytes of arrays of 1024 shorts), and we don't want > to have to run the analysis again because we were > a little short, so we allow 2050 iterations to be > analyzed separately before starting to lose precision. > > Manzana:~ pascal$ time frama-c -slevel 2050 -val t.c -lib-entry -main f1 >L 2>&1 > real 0m18.997s > Manzana:~ pascal$ grep t\\.c:32 L > Manzana:~ pascal$ > > Note that unrolling each loop does not make the abstract value > for expression (char *)(tV2[id].t) + i a singleton. > There are still different values for this expression, > because variable id is not precisely known. > > You cannot observe the values used to analyze each iteration > with -slevel, but you can try -ulevel instead, which > may or may not produce an AST too big for analysis or > for displaying in the GUI. Or you can use one of the provided > built-in for printing abstract values during the analysis: > > /* loop 2 */ > for(i=0; i<sizeof(T_TAB); i++) > { > Frama_C_show_each_address((char *)(tV2[id].t) + i); > ((char*)v2.t)[i] = ((char*)tV2[id].t)[i] ; > } > > Using the same command-line again produces in addition: > > [value] Called Frama_C_show_each_address({{ &tV2 + [2..18452],2%2050 ;}}) > [value] Called Frama_C_show_each_address({{ &tV2 + [3..18453],3%2050 ;}}) > [value] Called Frama_C_show_each_address({{ &tV2 + [4..18454],4%2050 ;}}) > [value] Called Frama_C_show_each_address({{ &tV2 + [5..18455],5%2050 ;}}) > ... you get the idea ... > [value] Called Frama_C_show_each_address({{ &tV2 + [2049..20499],2049%2050 ;}}) > > Each of the abstract values for the address > (char *)(tV2[id].t) + i > is not, because of the indeterminacy on id, a singleton. > Still, the indeterminacy that remains can be captured precisely > by the value analysis' r%m representation. At each iteration > it is able to capture the fact we are accessing inside a field t > (and therefore not inside padding), but in a different way > for each iteration. > > Pascal > __ > If I had a blog, this would probably had been the second post there. >