This page is about designing libm for the musl libc.
Writing the math code from scratch is a huge work so already existing code is used.
The math code is mostly from freebsd which in turn is based on fdlibm.
sources:
svn checkout svn://svn.freebsd.org/base/head/lib/msun
cvs -d $CVSROOT get src/lib/libm
The math code is available but there are still many design questions.
There are architecture specific code, long double representation specific code and complex code.
With respect to long double, bsd libm code can handle ld64, ld80 and ld128 (where ld64 means long double == double), for musl it's enough to support ld64 and ld80, but keeping ld128 shouldn't hurt much either.
So far musl had arch specific code and common code, it seems a new category is needed: long double specific code, which can be ld64, ld80 or ld128. These should be tied to the architecture in the build system somehow. In freebsd most of the long double code is common (using arch specific macros, ifdefs) and only a small part of the ld code is maintained separately for ld80 and ld128, but it's also possible to keep all ld code separately (with duplications) in ld80 and ld128.
Complex support is optional in c99 so maybe it should be optional in the musl build as well. It also can be kept separately in cmath/ or with the rest of the code in math/.
Pending questions:
The bsd libm code has many workarounds for various compiler issues. It's not clear what's the best way to handle the uglyness.
Pending questions:
The fdlibm code style is inconsistent with musl (and ugly) so it makes sense to clean it up, but keeping easy diffability can be useful. There are various inconsistencies and correctness issues in the code which might worth addressing (eg. reliance on signed overflow should be eliminated).
Pending questions:
Binary representation of floating point numbers matter because lots of bithacks are needed in the math code.
float and double bit manipulation can be handled in a portable way in c:
The endianness may still vary, but that can be worked around by using a union with a single large enough unsigned int. (The only exception is probably arm/oabi fpa where the word order of a double is not consistent with the endianness [debian wiki on arm], but we probably won't support that)
long double bit manipulation is harder as there are various representations:
ld64 is easy to handle: all long double functions are aliased to the corresponding double one (long double is treated equivalently to double)
ld80 is the most common (i386, x86_64), it means 64bit significand with explicit msb (inconsistent with other ieee formats), 15bit exp, 1 sign bit.
ld128 is rare (sparc64 with software emulation), it means 113bit significand with implicit msb, 15bit exp, 1 sign bit.
Endianness can vary (although on the supported i386 and x86_64 it is the same) and there is no large enough unsigned int to handle it. (In case of ld80 internal padding can vary as well, eg m68k and m88k cpus use different ld80 than the intel ones)
The ugly parts of libm hacking.
Some notes are from: http://www.vinc17.org/research/extended.en.html
If a value rounded twice the result can be different than rounding just once.
The C language allows arithmetic to be evaluated in higher precision than the operands have. If we use x87 fpu in extended precision mode it will round the results twice: round to 80bit when calculating and then round to 64bit when storing it, this can give different result than a single 64bit rounding. (on x86-linux the default fpu setting is to round the results in extended precision, this only affects x87 instructions, not see2 etc) (afaik freebsd and openbsd use double precision by default)
So x = a+b may give different results depending on the x87 fpu precision setting. (only happens in round to nearest rounding mode, but that's the most common one)
(double rounding can happen with float vs double as well)
C99 annex F prohibits double rounding, but that's non-normative.
Even if the fpu is set to double precision (which is not) the x87 registers use wider exponent range (mant:exp is 53:15 instead of 53:11 bits) so underflows (subnormals) may not be treated as expected. Rounding to double only occurs when a value is stored into memory.
Actually this beahviour is allowed by the ieee 754 standard, but it can cause problems (see infamous php bug)
C does not require consistent evaluation precision: the compiler may store intermediate results and round them to double while keep other parts in higher precision. So
(a+b)==(a+b)may be false when the two sides are kept in different precision. (This is not an x87 specific problem, it matters whenever there is a higher precision fp type than the currently used one. It goes away if the highest precision (long double) is used everywhere, but that can have a huge penalty).
C99 has a way to control this (see 5.1.2.3 example 4, 6.3.1.5 and 6.3.1.8): when a result is stored in a variable or a type cast is used, then it is guaranteed that the precision is appropriate to that type. So in
(double)(a+b)==(double)(a+b)both sides are guaranteed to be in double precision when the comparision is done.
(This still does not solve the x87 double rounding issue though: eg if the left side is evaluated with sse2 and the right side with x87 extended precision setting and double rounding then the result may be false and still conformant to C99)
Unfortunately gcc does not respect the standard and even if assingment or cast is used the result may be kept in higher precision (infamous gcc bug323 also see gcc bug36578). gcc 4.5 fixed it with '-fexcess-precision=standard' (it is enabled by '-std=c99', but the default is '-fexcess-precision=fast')
The workaround for older gcc is to force the compiler to store the intermediate results: by using volatile double temporary variables or by '-ffloat-store' (slow, and of course all intermediate results should be assigned to some variable, casting is not enough).
(Sometimes the excess precision is good but it's hard to rely on it as it is optional
There is a way to check for it though using FLT_EVAL_METHOD, float_t and double_t. But it's probably easier to unconditionally use higher precision variables when that's what we want and don't depend on the implicit excess precision).
Runtime and compile time semantics may be different gcc often does unwanted or even invalid compile time optimizations. (constant folding where floating point exception flags should be raised at runtime, or the result depends on runtime floating point environment, or constant expressions are just evaluated with different precision than at runtime).
C99 actually allows most of these optimizations but they can be turned off with STDC pragmas (see 6.10.6). Unfortunately gcc does not support these pragmas.
FENV_ACCESS ON tells the compiler that the code wants to access the floating point environment (eg. set different rounding mode) (see gcc bug34678).
(see gcc bug37845 for FP_CONTRACT pragma).
The workaround is again using named volatile variables for constants like
static const volatile two52 = 0x1p52;and using the '-frounding-math' gcc flag.
(According the freebsd libm code gcc truncates long double const literals on i386. I haven't yet verified if this still the case, but as a workaround sometimes double-double arithmetics is used: initializing the long double constant from two doubles)
The two representations are sufficiently different that treating them together is awkward.
In the freebsd libm code a few architecture specific macros and a union handle these issues, but the result is often less clear than treating ld80 and ld128 separately.
The freebsd libm code has many inconsistencies (naming conventions, 0x1p0 notation vs decimal notation,..), one of them is the integer type used for bitmanipulations: The bits of a double are unpacked into one of int32_t, uint32_t and u_int32_t integer types.
int32_t is used most often which is wrong because of implementation defined signed int representation.
In general signed int is not handled carefully in the libm code: scalbn even depends on signed int overflow.
gcc turns
x*Iinto
(x+0*I)*(0+I) = 0*x + x*ISo if x=inf then the result is nan+inf*I instead of inf*I (an fmul instruction is generated for 0*x)
(There were various complex constant folding issues as well in gcc).
So a+b*I cannot be used to create complex numbers, instead some double[2] manipulation should be used to make sure there is no fmul in the generated code (freebsd uses a static inline cpack function) (It's not clear which is better: using union hacks everywhere in the complex code or creal, cimag, cpack)
Complex code is hard to make portable across compilers as there are no good compiler support, it is rarely used and there are various options available in the standard: complex support is optional, imaginary type is optional.
It's not clear how I (or _Complex_I) should be defined in complex.h (literal of the float imaginary unit is compiler specific, in gcc it can be 1.0fi).
multiprecision libs (useful for tests)
other links
(page history, home)