--- layout: post author: Pascal Cuoq date: 2011-02-25 12:34 +0200 categories: floating-point format: xhtml title: "Numerical functions are not merely twisted" summary: --- {% raw %}
In a previous post there was a cosine function that I made up for the purpose of blogging about it. Let us now look at a real cosine function (OpenBSD's libm derived from widespread code written at Sun in the 1990s) to get a feeling how far off we were:
double __kernel_cos(double x double y) { double a hz z r qx; int32_t ix; GET_HIGH_WORD(ix x); ix &= 0x7fffffff; /* ix = |x|'s high word*/ if(ix<0x3e400000) { /* if x < 2**27 */ if(((int)x)==0) return one; /* generate inexact */ } z = x*x; r = z*(C1+z*(C2+z*(C3+z*(C4+z*(C5+z*C6))))); if(ix < 0x3FD33333) /* if |x| < 0.3 */ return one - (0.5*z - (z*r - x*y)); else { if(ix > 0x3fe90000) { /* x > 0.78125 */ qx = 0.28125; } else { INSERT_WORDS(qx ix-0x00200000 0); /* x/4 */ } hz = 0.5*z-qx; a = one-qx; return a - (hz - (z*r-x*y)); } }
Some remarks in decreasing order of obviousness:
if(ix<0x3e400000) { /* if x < 2**27 */
contains a clear bug: the comment should undoubtedly be "if x < 2**-27".x+y
. Argument y
is the "tail" of x
and since x
is a double
in the range -π/4 .. π/4 y
is really small and only needs to be taken into account in the lower-order terms of the polynomial approximation.double
argument x
with 32-bit integers and this is mostly for purposes of speed (only "mostly" here. In other functions from the same library the same trick is used and the only justification is speed). This made sense at a time but on a modern architecture with a sensible ABI this is very much misguided. Argument x
would be passed in a floating-point register and would stay there for the duration of the function if this pessimization did not force it to be written to memory. Worse I am pretty sure that with some of the architectures implementing SSE the program is penalized by long stalls for accessing with general registers memory that has been written from SSE registers.doubles
and reals. Look at it this way: if you did not have to think about this difference the whole mess with qx
would not be part of this function because with reals it doesn't have any effect to subtract the same quantity from two arguments of the same subtraction. And yes this correction is only there to obtain precision at a scale much smaller than we were verifying using the value analysis. The problem is this modification makes the code more complex for everyone. If you only need a precision of one thousandth but you absolutely must be sure that you have it otherwise the plane/nuclear power plant/satellite may crash it is now less certain that you have this precision because the increased complexity makes it less obvious the function works properly. We even found a bug although that was only in a comment. You'd better verify it carefully if that's the function you have decided to use.To be fair regarding the last point I chose a cosine implementation on purpose because its result gets closer to zero (making it easier to observe a possible imprecision) when its argument gets farther from zero (meaning less precision available for computations if you are not careful). The qx
trick actually serves to get computations closer to zero where more precision is (in absolute terms) available. The sine implementation does not need such a trick.