sources
-------

fdlibm sources
(freebsd is most up to date probably, but some code is
only available from openbsd)

freebsd /lib/msun
svn checkout svn://svn.freebsd.org/base/head/lib/msun

openbsd /src/lib/libm
cvs -d anoncvs@anoncvs.eu.openbsd.org:/cvs get src/lib/libm


code organization
-----------------

math code needs some special code organization
or build system tweaks

* long double:
(representation issues are below)
code organization constraints:
1.) internal .h to bit manipulate float and double
2.) long double representation specific .h
3.) long double representation specific .c
4.) common long double code for all representations
5.) no long double code when long double == double
6.) arch specific .s

1. can be put in src/internal
2. and 3. can be handled like arch specific code
(separate directories for different long double
representations and then somehow hook them up
in the build system)
4. can be regular source files in math/ but needs
ifdefs due to 5.
all double functions need an extra wrapper/alias
to solve 5.
6. can be done the usual way

dir tree:
/include/math.h
/include/complex.h
/arch/ld80/ldouble.h - duplicate per arch ?
/arch/ld128/ldouble.h
/src/internal/libm.h
/src/math/ld80/*l.c
/src/math/ld128/*l.c
/src/math/arm/*.s
/src/math/i386/*.s
/src/math/x86_64/*.s
/src/math/*.c

in freebsd most of the long double code is common
to the different representations but there are some
representation specific files in ld80/ and ld128/

it is possible to have all the long double code
duplicated and maintained separately in ld80/ and
ld128/ (or just ld80/ as ld128/ is rarely needed),
the implementations always have a few differences
(eventhough most of the logic is the same and the
constants only differ in precision etc)
so 4. can be mostly eliminated
or have no separate ld80/ and ld128/ but use ifdefs
and macros in the *l.c codes
os 3. can be eliminated

* complex:
it makes sense to make complex code optional
(compiler might not support it)
this needs some hacks: ifdefed sources or
some buildsystem hack

* code style and naming:
fdlibm has many ugly badly indented code
it makes sense to clean them up a bit if we
touch them anyway, but it's also good if the
code can be easily compared to other fdlibms..

fdlibm uses e_ s_ k_ w_ source code prefixes
(i guess they stand for ieee, standard c, kernel and
wrapper functions, but i'm not sure, i'd rather drop
these if fdlibm diffability was not important)


representation
--------------

float and double bit manipulation can be handled
in a portable way in c

float is ieee binary32
double is ieee binary64
(assuming old non-standard representations are not supported)

the endianness may still vary, but that can be worked
around by using a union with a single large enough
unsigned int

long double bit manipulation is harder as there are
various representations:
1.) same as double: ieee binary64
2.) ld80: 80bit extended precision format
3.) ld128: quadruple precision, ieee binary128
(assuming other non-standard formats are not supported)

case 1. is easy to handle: all long double functions
are aliased to the corresponding double one
(long double is treated equivalently to double)

case 2. is the most common (i386, x86_64, emulation)
case 3. is rare and means software emulation (sparc64)

ld80 means 64bit mant, explicit msb, 15bit exp, 1 sign bit
ld128 means 113bit mant, implicit msb, 15bit exp, 1 sign bit

endianness can vary (and there is no large enough unsigned
int to handle it), in case of ld80 padding can vary as well
(eg the 80bit float of m68k and m88k cpus has different
padding than the intel one)

supporting ld80 and ld128 with the same code can be done,
but requres ifdefs and macro hacks, treating them separately
means many code duplications


ugly
----

some notes are from:
http://www.vinc17.org/research/extended.en.html

1.) double rounding
the c language allows arithmetic to be evaluated in
higher precision and on x86 linux the default fpu
setting is to round the results in extended precision
(only x87 instructions are affected, sse2 is not)
(freebsd and openbsd use double precision by default)

rounding a result in extended precision first then
rounding it to double when storing it may give
different results than a single rounding to double
so x = a+b may give different results depending on
the fpu setting
(only happens in round to nearest rounding mode,
but that's the most common)

2.) wider exponent range
even if the fpu is set to double precision
(which is not) the x87 registers use wider exponent
range so underflows (subnormals) may not be treated
according to the ieee standard (unless every
computation result is stored and rounded to double)

3.) precision of intermediate results
c does not require consistent evaluation
precision: the compiler may store intermediate
results and round them to double while keep
other parts in extended precision
so (a+b)==(a+b) may be false when the two sides
are kept in different precision

c99 has a way to control this:
when a result is stored in a variable or a
type cast is used, then it is guaranteed that
the precision is appropriate to that type
so in (double)(a+b)==(double)(a+b) both
sides are guaranteed to be in double precision
when the comparision is done
(this still does not solve 1. though: eg if
the left side is evaluated with sse2 and the
right side with x87 then the result may be false)

unfortunately gcc does not respect the standard
(infamous gcc 323 bug)
gcc 4.5 fixed it with '-fexcess-precision=standard'
(it is enabled by '-std=c99', but the default is
'-fexcess-precision=fast')

the workaround for older gcc is to force the
compiler to store the intermediate results:
by using volatile double temporary variables
or by '-ffloat-store' (slow)

(sometimes the excess precision is good
but it's hard to rely on it as it is optional

there is a way to check for it though using
FLT_EVAL_METHOD, float_t and double_t

but even then it's probably easier to
unconditionally cast the variables of an
expression to a higher precision when
that's what we want and don't depend
on the implicit excess precision)

4.) runtime vs compile time
runtime and compile time semantics may be different
gcc often does unwanted or even invalid compile
time optimizations
(constant folding where floating point exception
flags should be raised at runtime, or the result
depends on runtime floating point environment,
or constant expressions are just evaluated with
different precision than at runtime)

the workaround is again using named volatile
variables like
static const volatile two52 = 0x1p52;

(?) old gcc truncated long double const expressions
on x86, the workaround is to put the constant
together from doubles at runtime

5.) ld80 vs ld128 bit representations (see above)
the two representations are sufficiently different
that treating them together is awkward

ld80 does not have implicit msb and the mantissa is
best handled by a single uint64_t
ld128 has implicit msb and the mantissa cannot be
handled in a single unsigned int

(typical operations:
get/set exp and sign bit (same for ld80 and ld128)
check x==0, mant==0 (ld80: explicit bit should be masked)
check |x| < 1, 0.5 (exp + mant manipulation)
check if subnormal
check if integral)

in the freebsd fdlibm code a few architecture specific
macros and a union handle these issues

6.) signed int
the fdlibm code has many inconsistencies
naming conventions, 0x1p0 notation vs decimal notation,
..and integer type used for bitmanipulation

when the bits of a double is manipulated it is unpacked
into two 32bit words, which can be
int32_t, uint32_t, u_int32_t

int32_t is used most often which is wrong because of
implementation defined signed int representation
(in general signed int is not handled carefully
scalbn even depends on signed int overflow)

7.) complex is not implemented in gcc properly

x*I turns into (x+0*I)*(0+I) = 0*x + x*I
so if x=inf then the result is nan+inf*I instead of inf*I
(an fmul instruction is generated for 0*x)

so a+b*I cannot be used to create complex numbers
instead some double[2] manipulation should be used to
make sure there is no fmul in the generated code
(freebsd uses a static inline cpack function)

it's not clear which is better: using union hacks
everywhere in the complex code or creal, cimag, cpack

union dcomplex {
	double complex z;
	double a[2];
};
// for efficiency these should be macros or inline functions
#define creal(z) ((double)(z))
#define cimag(z) ((union dcomplex){(z)}.a[1])
#define cpack(x,y) ((union dcomplex){.a={(x),(y)}}.z)

eg. various implementations for conj:

// proper implementation, but gcc does not do the right thing
double complex conj(double complex z) {
	return creal(z) - cimag(z)*I;
}

// using cpack
double complex conj(double complex z) {
	return cpack(creal(z), -cimag(z));
}

// explicit work around
double complex conj(double complex z) {
	union dcomplex u = {z};

	u.a[1] = -u.a[1];
	return u.z;
}

8.) complex code is hard to make portable across compilers

eg it's not clear how I (or _Complex_I) should be defined
in complex.h
(literal of the float imaginary unit is compiler specific,
in gcc it can be 1.0fi)

complex support is not mandatory in c99 conformant
implementations and even if complex is supported
there might or might not be separate imaginary type


libm implementations
--------------------

unix v7 libm by Robert Morris (rhm) around 1979
see: "Computer Approximations" by Hart & Cheney, 1968
used in: plan9, go

cephes by Stephen L. Moshier, 1984-1992
http://www.netlib.org/cephes/
see: "Methods and Programs for Mathematical Functions" by Moshier 1989

fdlibm by K. C. Ng around 1993, updated in 1995?
(1993: copyright notice: "Developed at SunPro ..")
(1995: copyright notice: "Developed at SunSoft ..")
see: "Argument Reduction for Huge Arguments: Good to the Last Bit" by Ng 1992
http://www.netlib.org/fdlibm
used in: netbsd, openbsd, freebsd, bionic, musl

libultim by Abraham Ziv around 2001 (gpl)
(also known as the ibm accurate portable math lib)
see: "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991
http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/
used in: glibc

libmcr by K. C. Ng, 2004
(sun's correctly rounded libm)

mathcw by Nelson H. F. Beebe, 2010?
http://ftp.math.utah.edu/pub/mathcw/
(no sources available?)

sleef by Naoki Shibata, 2010
(simd lib for evaluating elementary functions)
http://shibatch.sourceforge.net/

crlibm by ens-lyon, 2004-2010
http://lipforge.ens-lyon.fr/www/crlibm/
see: crlibm.pdf at http://lipforge.ens-lyon.fr/frs/?group_id=8&release_id=123
see: "Elementary Functions" by Jean-Michel Muller 2005
see: "Handbook of Floating-Point Arithmetic" by (many authors from Lyon) 2009

(closed: intel libm, hp libm for itanium, ..)


libm tests
----------

http://www.netlib.org/fp/ucbtest.tgz
http://www.jhauser.us/arithmetic/TestFloat.html
http://cant.ua.ac.be/old/ieeecc754.html
http://www.loria.fr/~zimmerma/mpcheck/
http://gforge.inria.fr/projects/mpcheck/
http://people.inf.ethz.ch/gonnet/FPAccuracy/Analysis.html
http://www.math.utah.edu/~beebe/software/ieee/
http://www.vinc17.org/research/testlibm/index.en.html
http://www.vinc17.org/research/fptest.en.html
tests in crlibm

multiprecision libs (useful for tests)
http://mpfr.org/
http://www.multiprecision.org/index.php?prog=mpc
http://pari.math.u-bordeaux.fr/
http://www.apfloat.org/apfloat/
http://code.google.com/p/fastfunlib/
scs_lib in crlibm


other
-----

ieee standard: http://754r.ucbtest.org/
extended precision issues: http://www.vinc17.org/research/extended.en.html
correctly rounded mult: http://perso.ens-lyon.fr/jean-michel.muller/MultConstant.html
table based sincos: http://perso.ens-lyon.fr/damien.stehle/IMPROVEDGAL.html
finding worst cases: http://perso.ens-lyon.fr/damien.stehle/WCLR.html
finding worst cases: http://perso.ens-lyon.fr/damien.stehle/DECIMALEXP.html
finding worst cases: http://www.loria.fr/equipes/spaces/slz.en.html
finding worst cases: http://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm
generating libm functions: http://lipforge.ens-lyon.fr/www/metalibm/
fast conversion to fixedpoint: http://stereopsis.com/sree/fpu2006.html
double-double, quad-double arithmetics: http://crd-legacy.lbl.gov/~dhbailey/mpdist/
papers by kahan: http://www.cs.berkeley.edu/~wkahan/
fp paper by goldberg: http://download.oracle.com/docs/cd/E19957-01/806-3568/ncg_goldberg.html
math functions in general: http://dlmf.nist.gov/