This page is about libm for the musl libc.
Writing math code from scratch is a huge work so already existing code is used. Several math functions are taken from the freebsd libm and a few from the openbsd libm implementations. Both of them are based on fdlibm. The freebsd libm seems to be the most well maintained and most correct version of fdlibm.
sources:
svn checkout svn://svn.freebsd.org/base/head/lib/msun
cvs -d $CVSROOT get src/lib/libm
|error| < 1.5 ulpshould hold for most functions. (error is the difference between the exact result and the calculated floating-point value) (in theory correct rounding can be achieved but with big implementation cost, see crlibm)
Binary representation of floating point numbers matter because bit hacks are often needed in the math code. (in particular bit hacks are used instead of relational operations for nan and sign checks becuase relational operators raise invalid fp exception on nan and they treat -0.0 and +0.0 equally and more often than not these are not desired)
float and double bit manipulation can be handled in a portable way in c using union types:
long double bit manipulation is harder as there are various representations and some of them don't map to any unsigned integer type:
In case of ld64 the bit manipulation is the same as with double and all long double math functions can be just wrappers around the corresponding double ones. (using symbol aliasing on the linker level is non-conformant since functions would not have unique address then)
ld80 is the most common long double on linux (i386 and x86_64 abi), it means 64bit significand with explicit msb (inconsistent with other ieee formats), 15bit exp, 1 sign bit. The m68k (and m88k) architecture uses the same format, but different endianness:
ld128 is rare (eg. sparc64 with software emulation), it means 113bit significand with implicit msb, 15bit exp, 1 sign bit:
There are other non-conformant long double types: eg. the old SVR4 abi for ppc uses 128 bit long doubles, but it's software emulated and traditionally implemented using two doubles (also called ibm long double as this is what ibm aix used on ppc). The ibm s390 supports the ieee 754-2008 compliant binary128 floating-point format, but previous ibm machines (S/370, S/360) used slightly different representation.
This variation shows the difficulty to consistently handle long double: the solution is to use ifdefs based on float.h and on the endianness and write different code for different architectures.
The ugly parts of libm hacking.
Some notes are from: http://www.vinc17.org/research/extended.en.html
If a value rounded twice the result can be different than rounding just once.
The C language allows arithmetic to be evaluated in higher precision than the operands have. If we use x87 fpu in extended precision mode it will round the results twice: round to 80bit when calculating and then round to 64bit when storing it, this can give different result than a single 64bit rounding. (on x86-linux the default fpu setting is to round the results in extended precision, this only affects x87 instructions, not see2 etc) (freebsd and openbsd use double precision by default)
So x = a+b may give different results depending on the x87 fpu precision setting. (only happens in round to nearest rounding mode, but that's the most common one)
(double rounding can happen with float vs double as well)
C99 annex F prohibits double rounding, but that's non-normative.
Even if the fpu is set to double precision (which is not) the x87 registers use wider exponent range (mant:exp is 53:15 instead of 53:11 bits) so underflows (subnormals) may not be treated as expected. Rounding to double only occurs when a value is stored into memory.
Actually this beahviour is allowed by the ieee 754 standard, but it can cause problems (see infamous php bug)
C does not require consistent evaluation precision: the compiler may store intermediate results and round them to double while keep other parts in higher precision. (And the precision the compiler choses can be inconsistent: adding a printf to the code may change the result of a nearby calculation). So
(a+b)==(a+b)may be false when the two sides are kept in different precision. (This is not an x87 specific problem, it matters whenever there is a higher precision fp type than the currently used one. It goes away if the highest precision (long double) is used everywhere, but that can have a huge penalty).
C99 has a way to control this (see 5.1.2.3 example 4, 6.3.1.5 and 6.3.1.8): when a result is stored in a variable or a type cast is used, then it is guaranteed that the precision is appropriate to that type. So in
(double)(a+b)==(double)(a+b)both sides are guaranteed to be in double precision when the comparision is done.
(This still does not solve the x87 double rounding issue though: eg if the left side is evaluated with sse2 and the right side with x87 extended precision setting and double rounding then the result may still be false)
Unfortunately gcc does not respect the standard and even if assingment or cast is used the result may be kept in higher precision (infamous gcc bug323 also see gcc bug36578). gcc 4.5 fixed it with '-fexcess-precision=standard' (it is enabled by '-std=c99', but the default is '-fexcess-precision=fast')
The workaround for older gcc is to force the compiler to store the intermediate results: by using volatile double temporary variables or by '-ffloat-store' (which affects all intermediate results, but is not guaranteed by the gcc docs to always work).
(Sometimes the excess precision is good but it's hard to rely on it as it is optional
There is a way to check for it though using FLT_EVAL_METHOD, float_t and double_t. But it's probably easier to unconditionally use higher precision variables when that's what we want and don't depend on the implicit excess precision).
Runtime and compile time semantics may be different gcc often does unwanted or even invalid compile time optimizations. (constant folding where floating point exception flags should be raised at runtime, or the result depends on runtime floating point environment, or constant expressions are just evaluated with different precision than at runtime).
C99 actually allows most of these optimizations but they can be turned off with STDC pragmas (see 6.10.6). Unfortunately gcc does not support these pragmas.
FENV_ACCESS ON tells the compiler that the code wants to access the floating point environment (eg. set different rounding mode) (see gcc bug34678).
(see gcc bug37845 for FP_CONTRACT pragma which is relevant when the architecture has fma instruction, x86_64, i386, arm do not have it).
The workaround is again using named volatile variables for constants like
static const volatile two52 = 0x1p52;and using the '-frounding-math' gcc flag.
According to the freebsd libm code gcc truncates long double const literals on i386. I assume this happens because freebsd uses 64bit long doubles by default (double precision) and gcc incorrectly uses the precision setting of the host platform instead of the target one, but i did not observe this on linux. (as a workaround sometimes double-double arithmetics was used to initialize long doubles on i386, but most of these should be fixed in musl's math code now)
The two representations are sufficiently different that treating them together is awkward. (Especially the explicit msb bit in ld80 can cause different behaviour).
In the freebsd libm code a few architecture specific macros and a union handle these issues, but the result is often less clear than treating ld80 and ld128 separately.
Signed zeros can be tricky. They cannot be checked using the usual comparision operators (+0.0 == -0.0 is true), but they give different results in some cases (1/-0.0 == -inf) and can make similarly looking expressions different eg. (x + 0) is not the same as x, the former is +0.0 when x is -0.0
(To check for -0, the signbit macro can be used or the copysign function or bit manipulation.)
Arithmetics may set various floating point exception flags as a side effect. These can be queried and manipulated (fetestexcept, feraiseexcept,..).
So special care is needed when a library function wants to avoid changing the floating point status flags. eg. if one wants to check for -0 silently then
if (x == 0.0 && 1/x < 0) { /* x is a -0 */ }is not ok: == raises invalid exception when x is nan, and even if we filter nans out 1/x will raise the divbyzero flag.
When a library wants to raise a flag deliberately but feraiseexcept is not available for some reason, then simple arithmetics can be be used just for their exception raising side effect (eg. 1/0.0 to raise divbyzero), however beaware of compiler optimizations (constant folding and dead code elimination,..).
Unfortunately gcc does not always take fp exceptions into account: a simple x = 1e300*1e300; may not raise overflow exception at runtime, but get optimized into x = +inf. see compiler optimizations above.
Another x87 gcc bug related to fp exceptions is that in some cases comparision operators (==, <, etc) don't raise invalid when an operand is nan (eventhough this is required by ieee + c99 annex F). (see gcc bug52451).
The ieee standard defines signaling and quiet nan floating-point numbers as well. The c99 standard only considers quiet nan, but it allows signaling nans to be supported as well. Without signaling nans x * 1 is equivalent to x, but if signaling nan is supported then the former raises an invalid exception. This may complicate things further if one wants to write portable fp math code.
A further libm design issue is the math_errhandling macro: it specifies the way math function errors can be checked which is either fp exceptions or the errno variable or both. The fp exception approach can be supported with enough care but errno is hard to support: certain library functions are implemented as a single asm instruction (eg sqrt), the only way to set errno is to query the fp exception flags and then set the errno variable based on that. So eventhough errno may be convenient, in libm it is not the right thing to do.
For soft-float targets however errno seems to be the only option (which means annex K cannot be fully supported, as it requires the support of exception flags). The problem is that at context switches the fpu status should be saved and restored which is done by the kernel on hard-fp architectures when the state is in an fpu status word. In case of soft-fp emulation this must be done by the c runtime: context switches between threads can be supported with thread local storage of the exception state, but signal handlers may do floating-point arithmetics which should not alter the fenv state. Wrapping signal handlers is not possible/difficult for various reasons and the compiler cannot know which functions will be used as signal handlers, so the c runtime has no way to guarantee that signal handlers do not alter the fenv.
gcc turns
x*Iinto
(x+0*I)*(0+I) = 0*x + x*ISo if x=inf then the result is nan+inf*I instead of inf*I (an fmul instruction is generated for 0*x)
(There were various complex constant folding issues as well in gcc).
So a+b*I cannot be used to create complex numbers, instead some double[2] manipulation should be used to make sure there is no fmul in the generated code (freebsd uses a static inline cpack function) (It's not clear which is better: using union hacks everywhere in the complex code or creal, cimag, cpack)
Complex code is hard to make portable across compilers as there are no good compiler support, it is rarely used and there are various options available in the standard: complex support is optional, imaginary type is optional.
It's not clear how I (or _Complex_I) should be defined in complex.h (literal of the float imaginary unit is compiler specific, in gcc it can be 1.0fi).
The freebsd libm code has many inconsistencies (naming conventions, 0x1p0 notation vs decimal notation,..), one of them is the integer type used for bit manipulations: The bits of a double are unpacked into one of int, int32_t, uint32_t and u_int32_t integer types.
int32_t is used the most often which is not wrong in itself but it is used incorrectly in many places.
int is a bit worse because unlike int32_t it is not guaranteed to be 32bit two's complement representation. (but of course in practice they are the same)
The issues found so far are left shift of negative integers (undefined behaviour), right shift of negative integers (implementation defined behaviour), signed overflow (implementation defined behaviour), unsigned to signed conversion (implementation defined behaviour).
It is easy to avoid these issues without performance impact, but a bit of care should be taken around bit manipulations.
multiprecision libs (useful for tests)
other links
(page history, home)