libm

This page is about libm for the musl libc.

Sources
General rules
Representation
Ugly
libm implementations
libm tests

Sources

Writing math code from scratch is a huge work so already existing code is used. Several math functions are taken from the freebsd libm and a few from the openbsd libm implementations. Both of them are based on fdlibm. The freebsd libm seems to be the most well maintained and most correct version of fdlibm.

sources:

freebsd /lib/msun (browse)

svn checkout svn://svn.freebsd.org/base/head/lib/msun

openbsd /src/lib/libm (browse)
```
cvs -d $CVSROOT get src/lib/libm
```

General rules

Assumptions about floating-point representation and arithmetics (see c99 annex F.2):
- float is ieee binary32
- double is ieee binary64
- long double is either ieee binary64 or little-endian 80bit extended precision (x87 fpu)
(other long double representations may be supported in the future, until then long double math functions will be missing on non-supported platforms)
On soft-float architectures fenv should not be expected to work according to the c and ieee standards (ie. rounding modes and exceptions are not supported, the fenv functions are dummy ones)
Floating-point exception flags should be properly set in math functions according to c99 annex F, but without using fenv.h (eg. overflow flag can be raised by 0x1p900*0x1p900, because this works even without fenv support)
Most functions need to be precise only in nearest rounding mode.
Returned floating-point values should be correctly rounded in most cases, but the last bit may be wrong:
```
    |error| < 1.5 ulp
```
should hold for most functions. (error is the difference between the exact result and the calculated floating-point value) (in theory correct rounding can be achieved but with big implementation cost, see crlibm)
At least the following functions must be correctly rounded: ceil, copysign, fabs, fdim, floor, fma, fmax, fmin, frexp, ldexp, logb, modf, nearbyint, nextafter, nexttoward, rint, round, scalbln, scalbn, sqrt, trunc.
Mathematical properties of functions should be as expected (monotonicity, range, symmetries).
If the FPU precision is altered then nothing is guaranteed to work. (ie. when long double does not have full 80bit precision on i386 then things may break, this also means that compiler must spill fpu registers at 80bits precision)
Signaling NaN is not supported
Quiet NaN is supported but all NaNs are treated equally without special attention to the internal representation of a NaN (eg. the sign of NaN may not be preserved).
Most gcc bug workarounds should be removed from the code (STRICT_ASSIGN macro is used when excessive precision is harmful and FORCE_EVAL when expressions must be evaluated for their side-effect, other usage of volatile is not justified, hacks around long double constants are not justified eventhough gcc can miscompile those with non-default FPU setting)
Whenever fenv is accessed the FENV_ACCESS pragma of c99 should be used (eventhough gcc does not yet support it), and all usage of optional FE_ macros should be protected by #ifdef
For bit manipulation of floating-point values an union should be used (eg. union {float f; uint32_t i;})
uint32_t and uint64_t should be used for bit manipulations. (eg signed int must not be used in bit shifts etc when it might invoke undefined or implementation defined behaviour).
POSIX namespace rules must be respected.
c99 hexfloat syntax (0x1.0p0) should be used when it makes the code clearer, but not in public header files (those should be c++ and ansi c compatible)
The 'f' suffix should be used for single precision values (0.1f) when the value cannot be exactly represented or the type of the arithmetics is important ((float)0.1 is not ok, that style may lead to double rounding issues, but eg. 1.0 or 0.5 may be used instead of 1.0f or 0.5f in some cases)
Prefer classification macros (eg. isnan) over inplace bit hacks.
For every math function there should be a c implementation. (a notable exception now is sqrtl, since most fpu has instruction for it and on soft-float architectures long double == double)
The c implementation of a long double function should use ifdefs with the LDBL_MANT_DIG etc constants from float.h for architecture specific implementations.
In the musl source tree math.h functions go to src/math, complex.h functions to src/complex and fenv.h functions to src/fenv. And files are named after the functions they implement.

Representation

Binary representation of floating point numbers matter because bit hacks are often needed in the math code. (in particular bit hacks are used instead of relational operations for nan and sign checks becuase relational operators raise invalid fp exception on nan and they treat -0.0 and +0.0 equally and more often than not these are not desired)

float and double bit manipulation can be handled in a portable way in c using union types:

union {float f; uint32_t i;};
union {double f; uint64_t i;};

(assuming the bits in the object representation of 32bit and 64bit unsigned ints map to the floating-point representation according to ieee-754, this is not always the case, eg. old arm floating-point accelerator (FPA) used mixed endian double representation, but musl does not support the old arm ABI)

long double bit manipulation is harder as there are various representations and some of them don't map to any unsigned integer type:

ld64: long double is the same as double (ieee binary64)
ld80: 80bit extended precision format [wikipedia]
ld128: quadruple precision, ieee binary128 [wikipedia]

(and other non-standard formats are not supported)

In case of ld64 the bit manipulation is the same as with double and all long double math functions can be just wrappers around the corresponding double ones. (using symbol aliasing on the linker level is non-conformant since functions would not have unique address then)

ld80 is the most common long double on linux (i386 and x86_64 abi), it means 64bit significand with explicit msb (inconsistent with other ieee formats), 15bit exp, 1 sign bit. The m68k (and m88k) architecture uses the same format, but different endianness:

union {long double f; struct{uint64_t m; uint16_t se; uint16_t pad;} i;}; // x86
union {long double f; struct{uint16_t se; uint16_t pad; uint64_t m;} i;}; // m68k

where m is the significand and se is the sign and exponent.

ld128 is rare (eg. sparc64 with software emulation), it means 113bit significand with implicit msb, 15bit exp, 1 sign bit:

union {long double f; struct{uint16_t se; uint16_t hi; uint32_t mid; uint64_t lo;} i;};

There are other non-conformant long double types: eg. the old SVR4 abi for ppc uses 128 bit long doubles, but it's software emulated and traditionally implemented using two doubles (also called ibm long double as this is what ibm aix used on ppc). The ibm s390 supports the ieee 754-2008 compliant binary128 floating-point format, but previous ibm machines (S/370, S/360) used slightly different representation.

This variation shows the difficulty to consistently handle long double: the solution is to use ifdefs based on float.h and on the endianness and write different code for different architectures.

Ugly

The ugly parts of libm hacking.

Some notes are from: http://www.vinc17.org/research/extended.en.html

Double rounding:
If a value rounded twice the result can be different than rounding just once.
The C language allows arithmetic to be evaluated in higher precision than the operands have. If we use x87 fpu in extended precision mode it will round the results twice: round to 80bit when calculating and then round to 64bit when storing it, this can give different result than a single 64bit rounding. (on x86-linux the default fpu setting is to round the results in extended precision, this only affects x87 instructions, not see2 etc) (freebsd and openbsd use double precision by default)
So x = a+b may give different results depending on the x87 fpu precision setting. (only happens in round to nearest rounding mode, but that's the most common one)
(double rounding can happen with float vs double as well)
C99 annex F prohibits double rounding, but that's non-normative.
Wider exponent range (x87 issue):
Even if the fpu is set to double precision (which is not) the x87 registers use wider exponent range (mant:exp is 53:15 instead of 53:11 bits) so underflows (subnormals) may not be treated as expected. Rounding to double only occurs when a value is stored into memory.
Actually this beahviour is allowed by the ieee 754 standard, but it can cause problems (see infamous php bug)
Evaluation precision:
C does not require consistent evaluation precision: the compiler may store intermediate results and round them to double while keep other parts in higher precision. (And the precision the compiler choses can be inconsistent: adding a printf to the code may change the result of a nearby calculation). So
```
(a+b)==(a+b)
```
may be false when the two sides are kept in different precision. (This is not an x87 specific problem, it matters whenever there is a higher precision fp type than the currently used one. It goes away if the highest precision (long double) is used everywhere, but that can have a huge penalty).
C99 has a way to control this (see 5.1.2.3 example 4, 6.3.1.5 and 6.3.1.8): when a result is stored in a variable or a type cast is used, then it is guaranteed that the precision is appropriate to that type. So in
```
(double)(a+b)==(double)(a+b)
```
both sides are guaranteed to be in double precision when the comparision is done.
(This still does not solve the x87 double rounding issue though: eg if the left side is evaluated with sse2 and the right side with x87 extended precision setting and double rounding then the result may still be false)
Unfortunately gcc does not respect the standard and even if assingment or cast is used the result may be kept in higher precision (infamous gcc bug323 also see gcc bug36578). gcc 4.5 fixed it with '-fexcess-precision=standard' (it is enabled by '-std=c99', but the default is '-fexcess-precision=fast')
The workaround for older gcc is to force the compiler to store the intermediate results: by using volatile double temporary variables or by '-ffloat-store' (which affects all intermediate results, but is not guaranteed by the gcc docs to always work).
(Sometimes the excess precision is good but it's hard to rely on it as it is optional
There is a way to check for it though using FLT_EVAL_METHOD, float_t and double_t. But it's probably easier to unconditionally use higher precision variables when that's what we want and don't depend on the implicit excess precision).
Compiler optimizations:
Runtime and compile time semantics may be different gcc often does unwanted or even invalid compile time optimizations. (constant folding where floating point exception flags should be raised at runtime, or the result depends on runtime floating point environment, or constant expressions are just evaluated with different precision than at runtime).
C99 actually allows most of these optimizations but they can be turned off with STDC pragmas (see 6.10.6). Unfortunately gcc does not support these pragmas.
FENV_ACCESS ON tells the compiler that the code wants to access the floating point environment (eg. set different rounding mode) (see gcc bug34678).
(see gcc bug37845 for FP_CONTRACT pragma which is relevant when the architecture has fma instruction, x86_64, i386, arm do not have it).
The workaround is again using named volatile variables for constants like
```
static const volatile two52 = 0x1p52;
```
and using the '-frounding-math' gcc flag.
According to the freebsd libm code gcc truncates long double const literals on i386. I assume this happens because freebsd uses 64bit long doubles by default (double precision) and gcc incorrectly uses the precision setting of the host platform instead of the target one, but i did not observe this on linux. (as a workaround sometimes double-double arithmetics was used to initialize long doubles on i386, but most of these should be fixed in musl's math code now)
ld80 vs ld128
The two representations are sufficiently different that treating them together is awkward. (Especially the explicit msb bit in ld80 can cause different behaviour).
In the freebsd libm code a few architecture specific macros and a union handle these issues, but the result is often less clear than treating ld80 and ld128 separately.
Signed zeros
Signed zeros can be tricky. They cannot be checked using the usual comparision operators (+0.0 == -0.0 is true), but they give different results in some cases (1/-0.0 == -inf) and can make similarly looking expressions different eg. (x + 0) is not the same as x, the former is +0.0 when x is -0.0
(To check for -0, the signbit macro can be used or the copysign function or bit manipulation.)
Error handling
Arithmetics may set various floating point exception flags as a side effect. These can be queried and manipulated (fetestexcept, feraiseexcept,..).
So special care is needed when a library function wants to avoid changing the floating point status flags. eg. if one wants to check for -0 silently then
```
if (x == 0.0 && 1/x < 0) { /* x is a -0 */ }
```
is not ok: == raises invalid exception when x is nan, and even if we filter nans out 1/x will raise the divbyzero flag.
When a library wants to raise a flag deliberately but feraiseexcept is not available for some reason, then simple arithmetics can be be used just for their exception raising side effect (eg. 1/0.0 to raise divbyzero), however beaware of compiler optimizations (constant folding and dead code elimination,..).
Unfortunately gcc does not always take fp exceptions into account: a simple x = 1e300*1e300; may not raise overflow exception at runtime, but get optimized into x = +inf. see compiler optimizations above.
Another x87 gcc bug related to fp exceptions is that in some cases comparision operators (==, <, etc) don't raise invalid when an operand is nan (eventhough this is required by ieee + c99 annex F). (see gcc bug52451).
The ieee standard defines signaling and quiet nan floating-point numbers as well. The c99 standard only considers quiet nan, but it allows signaling nans to be supported as well. Without signaling nans x * 1 is equivalent to x, but if signaling nan is supported then the former raises an invalid exception. This may complicate things further if one wants to write portable fp math code.
A further libm design issue is the math_errhandling macro: it specifies the way math function errors can be checked which is either fp exceptions or the errno variable or both. The fp exception approach can be supported with enough care but errno is hard to support: certain library functions are implemented as a single asm instruction (eg sqrt), the only way to set errno is to query the fp exception flags and then set the errno variable based on that. So eventhough errno may be convenient, in libm it is not the right thing to do.
For soft-float targets however errno seems to be the only option (which means annex K cannot be fully supported, as it requires the support of exception flags). The problem is that at context switches the fpu status should be saved and restored which is done by the kernel on hard-fp architectures when the state is in an fpu status word. In case of soft-fp emulation this must be done by the c runtime: context switches between threads can be supported with thread local storage of the exception state, but signal handlers may do floating-point arithmetics which should not alter the fenv state. Wrapping signal handlers is not possible/difficult for various reasons and the compiler cannot know which functions will be used as signal handlers, so the c runtime has no way to guarantee that signal handlers do not alter the fenv.
Complex arithmetics
gcc turns
```
x*I
```
into
```
(x+0*I)*(0+I) = 0*x + x*I
```
So if x=inf then the result is nan+inf*I instead of inf*I (an fmul instruction is generated for 0*x)
(There were various complex constant folding issues as well in gcc).
So a+b*I cannot be used to create complex numbers, instead some double[2] manipulation should be used to make sure there is no fmul in the generated code (freebsd uses a static inline cpack function) (It's not clear which is better: using union hacks everywhere in the complex code or creal, cimag, cpack)
Complex code is hard to make portable across compilers as there are no good compiler support, it is rarely used and there are various options available in the standard: complex support is optional, imaginary type is optional.
It's not clear how I (or _Complex_I) should be defined in complex.h (literal of the float imaginary unit is compiler specific, in gcc it can be 1.0fi).
Signed int
The freebsd libm code has many inconsistencies (naming conventions, 0x1p0 notation vs decimal notation,..), one of them is the integer type used for bit manipulations: The bits of a double are unpacked into one of int, int32_t, uint32_t and u_int32_t integer types.
int32_t is used the most often which is not wrong in itself but it is used incorrectly in many places.
int is a bit worse because unlike int32_t it is not guaranteed to be 32bit two's complement representation. (but of course in practice they are the same)
The issues found so far are left shift of negative integers (undefined behaviour), right shift of negative integers (implementation defined behaviour), signed overflow (implementation defined behaviour), unsigned to signed conversion (implementation defined behaviour).
It is easy to avoid these issues without performance impact, but a bit of care should be taken around bit manipulations.

libm implementations

unix v7 libm by Robert Morris (rhm) around 1979
see: "Computer Approximations" by Hart & Cheney, 1968
used in: plan9, go
cephes by Stephen L. Moshier, 1984-1992
http://www.netlib.org/cephes/
see: "Methods and Programs for Mathematical Functions" by Moshier 1989
fdlibm by K. C. Ng around 1993, updated in 1995?
(1993: copyright notice: "Developed at SunPro ..")
(1995: copyright notice: "Developed at SunSoft ..")
eg see: "Argument Reduction for Huge Arguments: Good to the Last Bit" by Ng 1992
http://www.netlib.org/fdlibm
used in: netbsd, openbsd, freebsd, bionic, musl
libultim by Abraham Ziv around 2001 (lgpl)
(also known as the ibm accurate portable math lib)
see: "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991
http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/
used in: glibc
libmcr by K. C. Ng, 2004
(sun's correctly rounded libm)
mathcw by Nelson H. F. Beebe, 2010?
http://ftp.math.utah.edu/pub/mathcw/
(no sources available?)
sleef by Naoki Shibata, 2010
(simd lib for evaluating elementary functions)
http://shibatch.sourceforge.net/
crlibm by ens-lyon, 2004-2010
http://lipforge.ens-lyon.fr/www/crlibm/
see: crlibm.pdf
see: "Elementary Functions" by Jean-Michel Muller 2005
see: "Handbook of Floating-Point Arithmetic" by (many authors from Lyon) 2009
various other closed ones: intel libm, hp libm for itanium,..

libm tests

multiprecision libs (useful for tests)

mpfr
mpc
pari
apfloat
fastfunlib
scs_lib in crlibm