libm

This page is about designing libm for the musl libc.

Writing the math code from scratch is a huge work so already existing code is used.

Sources
Design issues
Representation
Ugly
libm implementations
libm tests

Sources

The math code is mostly from freebsd which in turn is based on fdlibm.

sources:

freebsd /lib/msun (browse)

svn checkout svn://svn.freebsd.org/base/head/lib/msun

openbsd /src/lib/libm (browse)
```
cvs -d $CVSROOT get src/lib/libm
```

Design issues

The math code is available but there are still many design questions.

Code organization, build system tweaks:
There are architecture specific code, long double representation specific code and complex code.
With respect to long double, bsd libm code can handle ld64, ld80 and ld128 (where ld64 means long double == double), for musl it's enough to support ld64 and ld80, but keeping ld128 shouldn't hurt much either.
So far musl had arch specific code and common code, it seems a new category is needed: long double specific code, which can be ld64, ld80 or ld128. These should be tied to the architecture in the build system somehow. In freebsd most of the long double code is common (using arch specific macros, ifdefs) and only a small part of the ld code is maintained separately for ld80 and ld128, but it's also possible to keep all ld code separately (with duplications) in ld80 and ld128.
Complex support is optional in c99 so maybe it should be optional in the musl build as well. It also can be kept separately in cmath/ or with the rest of the code in math/.
Pending questions:
- Do we want ld128?
- Should we try to use common code for ld80 and ld128?
- How to do ld64: wrap double functions or alias them?
- How to tie the ld* code to the arch in the build system?
- Make complex optional?
- Keep complex in math/ or cmath/?
Workarounds
The bsd libm code has many workarounds for various compiler issues. It's not clear what's the best way to handle the uglyness.
Pending questions:
- Use STDC pragmas (eventhough gcc does not support them)?
- Use volatile consistently to avoid evaluation precision and const folding issues?
- Use special compiler flags against unwanted optimization (-frounding-math, -ffloat-store)?
- Do inline/macro optimization for small functions? (isnan, isinf, signbit, creal, cimag,..)
- In complex code prefer creal(), cimag() or a union to (un)pack re,im?
Code cleanups
The fdlibm code style is inconsistent with musl (and ugly) so it makes sense to clean it up, but keeping easy diffability can be useful. There are various inconsistencies and correctness issues in the code which might worth addressing (eg. reliance on signed overflow should be eliminated).
Pending questions:
- Keep diffability with freebsd/openbsd code or reformat the code for clarity?
- Keep e_, s_, k_ fdlibm source file name prefixes?
- Should 0x1p0 float format be preferred over decimal format?
- Should isnan, signbit,.. be preferred over inplace bithacks?
- Is unpacking a double into 2 int32_t ok (signed int representation)?
- Is unpacking the mantissa of ld80 into an int64_t ok?

Representation

Binary representation of floating point numbers matter because lots of bithacks are needed in the math code.

float and double bit manipulation can be handled in a portable way in c:

float is ieee binary32
double is ieee binary64

(and old non-standard representations are not supported)

The endianness may still vary, but that can be worked around by using a union with a single large enough unsigned int. (The only exception is probably arm/oabi fpa where the word order of a double is not consistent with the endianness [debian wiki on arm], but we probably won't support that)

long double bit manipulation is harder as there are various representations:

ld64: long double is the same as double (ieee binary64)
ld80: 80bit extended precision format [wikipedia]
ld128: quadruple precision, ieee binary128 [wikipedia]

(and other non-standard formats are not supported)

ld64 is easy to handle: all long double functions are aliased to the corresponding double one (long double is treated equivalently to double)

ld80 is the most common (i386, x86_64), it means 64bit significand with explicit msb (inconsistent with other ieee formats), 15bit exp, 1 sign bit.

ld128 is rare (sparc64 with software emulation), it means 113bit significand with implicit msb, 15bit exp, 1 sign bit.

Endianness can vary (although on the supported i386 and x86_64 it is the same) and there is no large enough unsigned int to handle it. (In case of ld80 internal padding can vary as well, eg m68k and m88k cpus use different ld80 than the intel ones)

Ugly

The ugly parts of libm hacking.

Some notes are from: http://www.vinc17.org/research/extended.en.html

Double rounding:
If a value rounded twice the result can be different than rounding just once.
The C language allows arithmetic to be evaluated in higher precision than the operands have. If we use x87 fpu in extended precision mode it will round the results twice: round to 80bit when calculating and then round to 64bit when storing it, this can give different result than a single 64bit rounding. (on x86-linux the default fpu setting is to round the results in extended precision, this only affects x87 instructions, not see2 etc) (afaik freebsd and openbsd use double precision by default)
So x = a+b may give different results depending on the x87 fpu precision setting. (only happens in round to nearest rounding mode, but that's the most common one)
(double rounding can happen with float vs double as well)
C99 annex F prohibits double rounding, but that's non-normative.
Wider exponent range (x87 issue):
Even if the fpu is set to double precision (which is not) the x87 registers use wider exponent range (mant:exp is 53:15 instead of 53:11 bits) so underflows (subnormals) may not be treated as expected. Rounding to double only occurs when a value is stored into memory.
Actually this beahviour is allowed by the ieee 754 standard, but it can cause problems (see infamous php bug)
Evaluation precision:
C does not require consistent evaluation precision: the compiler may store intermediate results and round them to double while keep other parts in higher precision. So
```
(a+b)==(a+b)
```
may be false when the two sides are kept in different precision. (This is not an x87 specific problem, it matters whenever there is a higher precision fp type than the currently used one. It goes away if the highest precision (long double) is used everywhere, but that can have a huge penalty).
C99 has a way to control this (see 5.1.2.3 example 4, 6.3.1.5 and 6.3.1.8): when a result is stored in a variable or a type cast is used, then it is guaranteed that the precision is appropriate to that type. So in
```
(double)(a+b)==(double)(a+b)
```
both sides are guaranteed to be in double precision when the comparision is done.
(This still does not solve the x87 double rounding issue though: eg if the left side is evaluated with sse2 and the right side with x87 extended precision setting and double rounding then the result may be false and still conformant to C99)
Unfortunately gcc does not respect the standard and even if assingment or cast is used the result may be kept in higher precision (infamous gcc bug323 also see gcc bug36578). gcc 4.5 fixed it with '-fexcess-precision=standard' (it is enabled by '-std=c99', but the default is '-fexcess-precision=fast')
The workaround for older gcc is to force the compiler to store the intermediate results: by using volatile double temporary variables or by '-ffloat-store' (slow, and of course all intermediate results should be assigned to some variable, casting is not enough).
(Sometimes the excess precision is good but it's hard to rely on it as it is optional
There is a way to check for it though using FLT_EVAL_METHOD, float_t and double_t. But it's probably easier to unconditionally use higher precision variables when that's what we want and don't depend on the implicit excess precision).
Compiler optimizations:
Runtime and compile time semantics may be different gcc often does unwanted or even invalid compile time optimizations. (constant folding where floating point exception flags should be raised at runtime, or the result depends on runtime floating point environment, or constant expressions are just evaluated with different precision than at runtime).
C99 actually allows most of these optimizations but they can be turned off with STDC pragmas (see 6.10.6). Unfortunately gcc does not support these pragmas.
FENV_ACCESS ON tells the compiler that the code wants to access the floating point environment (eg. set different rounding mode) (see gcc bug34678).
(see gcc bug37845 for FP_CONTRACT pragma).
The workaround is again using named volatile variables for constants like
```
static const volatile two52 = 0x1p52;
```
and using the '-frounding-math' gcc flag.
(According the freebsd libm code gcc truncates long double const literals on i386. I haven't yet verified if this still the case, but as a workaround sometimes double-double arithmetics is used: initializing the long double constant from two doubles)
ld80 vs ld128
The two representations are sufficiently different that treating them together is awkward.
In the freebsd libm code a few architecture specific macros and a union handle these issues, but the result is often less clear than treating ld80 and ld128 separately.
Signed int
The freebsd libm code has many inconsistencies (naming conventions, 0x1p0 notation vs decimal notation,..), one of them is the integer type used for bitmanipulations: The bits of a double are unpacked into one of int32_t, uint32_t and u_int32_t integer types.
int32_t is used most often which is wrong because of implementation defined signed int representation.
In general signed int is not handled carefully in the libm code: scalbn even depends on signed int overflow.
Complex arithmetics
gcc turns
```
x*I
```
into
```
(x+0*I)*(0+I) = 0*x + x*I
```
So if x=inf then the result is nan+inf*I instead of inf*I (an fmul instruction is generated for 0*x)
(There were various complex constant folding issues as well in gcc).
So a+b*I cannot be used to create complex numbers, instead some double[2] manipulation should be used to make sure there is no fmul in the generated code (freebsd uses a static inline cpack function) (It's not clear which is better: using union hacks everywhere in the complex code or creal, cimag, cpack)
Complex code is hard to make portable across compilers as there are no good compiler support, it is rarely used and there are various options available in the standard: complex support is optional, imaginary type is optional.
It's not clear how I (or _Complex_I) should be defined in complex.h (literal of the float imaginary unit is compiler specific, in gcc it can be 1.0fi).

libm implementations

unix v7 libm by Robert Morris (rhm) around 1979
see: "Computer Approximations" by Hart & Cheney, 1968
used in: plan9, go
cephes by Stephen L. Moshier, 1984-1992
http://www.netlib.org/cephes/
see: "Methods and Programs for Mathematical Functions" by Moshier 1989
fdlibm by K. C. Ng around 1993, updated in 1995?
(1993: copyright notice: "Developed at SunPro ..")
(1995: copyright notice: "Developed at SunSoft ..")
eg see: "Argument Reduction for Huge Arguments: Good to the Last Bit" by Ng 1992
http://www.netlib.org/fdlibm
used in: netbsd, openbsd, freebsd, bionic, musl
libultim by Abraham Ziv around 2001 (lgpl)
(also known as the ibm accurate portable math lib)
see: "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991
http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/
used in: glibc
libmcr by K. C. Ng, 2004
(sun's correctly rounded libm)
mathcw by Nelson H. F. Beebe, 2010?
http://ftp.math.utah.edu/pub/mathcw/
(no sources available?)
sleef by Naoki Shibata, 2010
(simd lib for evaluating elementary functions)
http://shibatch.sourceforge.net/
crlibm by ens-lyon, 2004-2010
http://lipforge.ens-lyon.fr/www/crlibm/
see: crlibm.pdf
see: "Elementary Functions" by Jean-Michel Muller 2005
see: "Handbook of Floating-Point Arithmetic" by (many authors from Lyon) 2009
various other closed ones: intel libm, hp libm for itanium,..

libm tests

multiprecision libs (useful for tests)

mpfr
mpc
pari
apfloat
fastfunlib
scs_lib in crlibm