sources ------- fdlibm sources (freebsd is most up to date probably, but some code is only available from openbsd) freebsd /lib/msun svn checkout svn://svn.freebsd.org/base/head/lib/msun openbsd /src/lib/libm cvs -d anoncvs@anoncvs.eu.openbsd.org:/cvs get src/lib/libm code organization ----------------- math code needs some special code organization or build system tweaks * long double: (representation issues are below) code organization constraints: 1.) internal .h to bit manipulate float and double 2.) long double representation specific .h 3.) long double representation specific .c 4.) common long double code for all representations 5.) no long double code when long double == double 6.) arch specific .s 1. can be put in src/internal 2. and 3. can be handled like arch specific code (separate directories for different long double representations and then somehow hook them up in the build system) 4. can be regular source files in math/ but needs ifdefs due to 5. all double functions need an extra wrapper/alias to solve 5. 6. can be done the usual way dir tree: /include/math.h /include/complex.h /arch/ld80/ldouble.h - duplicate per arch ? /arch/ld128/ldouble.h /src/internal/libm.h /src/math/ld80/*l.c /src/math/ld128/*l.c /src/math/arm/*.s /src/math/i386/*.s /src/math/x86_64/*.s /src/math/*.c in freebsd most of the long double code is common to the different representations but there are some representation specific files in ld80/ and ld128/ it is possible to have all the long double code duplicated and maintained separately in ld80/ and ld128/ (or just ld80/ as ld128/ is rarely needed), the implementations always have a few differences (eventhough most of the logic is the same and the constants only differ in precision etc) so 4. can be mostly eliminated or have no separate ld80/ and ld128/ but use ifdefs and macros in the *l.c codes os 3. can be eliminated * complex: it makes sense to make complex code optional (compiler might not support it) this needs some hacks: ifdefed sources or some buildsystem hack * code style and naming: fdlibm has many ugly badly indented code it makes sense to clean them up a bit if we touch them anyway, but it's also good if the code can be easily compared to other fdlibms.. fdlibm uses e_ s_ k_ w_ source code prefixes (i guess they stand for ieee, standard c, kernel and wrapper functions, but i'm not sure, i'd rather drop these if fdlibm diffability was not important) representation -------------- float and double bit manipulation can be handled in a portable way in c float is ieee binary32 double is ieee binary64 (assuming old non-standard representations are not supported) the endianness may still vary, but that can be worked around by using a union with a single large enough unsigned int long double bit manipulation is harder as there are various representations: 1.) same as double: ieee binary64 2.) ld80: 80bit extended precision format 3.) ld128: quadruple precision, ieee binary128 (assuming other non-standard formats are not supported) case 1. is easy to handle: all long double functions are aliased to the corresponding double one (long double is treated equivalently to double) case 2. is the most common (i386, x86_64, emulation) case 3. is rare and means software emulation (sparc64) ld80 means 64bit mant, explicit msb, 15bit exp, 1 sign bit ld128 means 113bit mant, implicit msb, 15bit exp, 1 sign bit endianness can vary (and there is no large enough unsigned int to handle it), in case of ld80 padding can vary as well (eg the 80bit float of m68k and m88k cpus has different padding than the intel one) supporting ld80 and ld128 with the same code can be done, but requres ifdefs and macro hacks, treating them separately means many code duplications ugly ---- some notes are from: http://www.vinc17.org/research/extended.en.html 1.) double rounding the c language allows arithmetic to be evaluated in higher precision and on x86 linux the default fpu setting is to round the results in extended precision (only x87 instructions are affected, sse2 is not) (freebsd and openbsd use double precision by default) rounding a result in extended precision first then rounding it to double when storing it may give different results than a single rounding to double so x = a+b may give different results depending on the fpu setting (only happens in round to nearest rounding mode, but that's the most common) 2.) wider exponent range even if the fpu is set to double precision (which is not) the x87 registers use wider exponent range so underflows (subnormals) may not be treated according to the ieee standard (unless every computation result is stored and rounded to double) 3.) precision of intermediate results c does not require consistent evaluation precision: the compiler may store intermediate results and round them to double while keep other parts in extended precision so (a+b)==(a+b) may be false when the two sides are kept in different precision c99 has a way to control this: when a result is stored in a variable or a type cast is used, then it is guaranteed that the precision is appropriate to that type so in (double)(a+b)==(double)(a+b) both sides are guaranteed to be in double precision when the comparision is done (this still does not solve 1. though: eg if the left side is evaluated with sse2 and the right side with x87 then the result may be false) unfortunately gcc does not respect the standard (infamous gcc 323 bug) gcc 4.5 fixed it with '-fexcess-precision=standard' (it is enabled by '-std=c99', but the default is '-fexcess-precision=fast') the workaround for older gcc is to force the compiler to store the intermediate results: by using volatile double temporary variables or by '-ffloat-store' (slow) (sometimes the excess precision is good but it's hard to rely on it as it is optional there is a way to check for it though using FLT_EVAL_METHOD, float_t and double_t but even then it's probably easier to unconditionally cast the variables of an expression to a higher precision when that's what we want and don't depend on the implicit excess precision) 4.) runtime vs compile time runtime and compile time semantics may be different gcc often does unwanted or even invalid compile time optimizations (constant folding where floating point exception flags should be raised at runtime, or the result depends on runtime floating point environment, or constant expressions are just evaluated with different precision than at runtime) the workaround is again using named volatile variables like static const volatile two52 = 0x1p52; (?) old gcc truncated long double const expressions on x86, the workaround is to put the constant together from doubles at runtime 5.) ld80 vs ld128 bit representations (see above) the two representations are sufficiently different that treating them together is awkward ld80 does not have implicit msb and the mantissa is best handled by a single uint64_t ld128 has implicit msb and the mantissa cannot be handled in a single unsigned int (typical operations: get/set exp and sign bit (same for ld80 and ld128) check x==0, mant==0 (ld80: explicit bit should be masked) check |x| < 1, 0.5 (exp + mant manipulation) check if subnormal check if integral) in the freebsd fdlibm code a few architecture specific macros and a union handle these issues 6.) signed int the fdlibm code has many inconsistencies naming conventions, 0x1p0 notation vs decimal notation, ..and integer type used for bitmanipulation when the bits of a double is manipulated it is unpacked into two 32bit words, which can be int32_t, uint32_t, u_int32_t int32_t is used most often which is wrong because of implementation defined signed int representation (in general signed int is not handled carefully scalbn even depends on signed int overflow) 7.) complex is not implemented in gcc properly x*I turns into (x+0*I)*(0+I) = 0*x + x*I so if x=inf then the result is nan+inf*I instead of inf*I (an fmul instruction is generated for 0*x) so a+b*I cannot be used to create complex numbers instead some double[2] manipulation should be used to make sure there is no fmul in the generated code (freebsd uses a static inline cpack function) it's not clear which is better: using union hacks everywhere in the complex code or creal, cimag, cpack union dcomplex { double complex z; double a[2]; }; // for efficiency these should be macros or inline functions #define creal(z) ((double)(z)) #define cimag(z) ((union dcomplex){(z)}.a[1]) #define cpack(x,y) ((union dcomplex){.a={(x),(y)}}.z) eg. various implementations for conj: // proper implementation, but gcc does not do the right thing double complex conj(double complex z) { return creal(z) - cimag(z)*I; } // using cpack double complex conj(double complex z) { return cpack(creal(z), -cimag(z)); } // explicit work around double complex conj(double complex z) { union dcomplex u = {z}; u.a[1] = -u.a[1]; return u.z; } 8.) complex code is hard to make portable across compilers eg it's not clear how I (or _Complex_I) should be defined in complex.h (literal of the float imaginary unit is compiler specific, in gcc it can be 1.0fi) complex support is not mandatory in c99 conformant implementations and even if complex is supported there might or might not be separate imaginary type libm implementations -------------------- unix v7 libm by Robert Morris (rhm) around 1979 see: "Computer Approximations" by Hart & Cheney, 1968 used in: plan9, go cephes by Stephen L. Moshier, 1984-1992 http://www.netlib.org/cephes/ see: "Methods and Programs for Mathematical Functions" by Moshier 1989 fdlibm by K. C. Ng around 1993, updated in 1995? (1993: copyright notice: "Developed at SunPro ..") (1995: copyright notice: "Developed at SunSoft ..") see: "Argument Reduction for Huge Arguments: Good to the Last Bit" by Ng 1992 http://www.netlib.org/fdlibm used in: netbsd, openbsd, freebsd, bionic, musl libultim by Abraham Ziv around 2001 (gpl) (also known as the ibm accurate portable math lib) see: "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991 http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/ used in: glibc libmcr by K. C. Ng, 2004 (sun's correctly rounded libm) mathcw by Nelson H. F. Beebe, 2010? http://ftp.math.utah.edu/pub/mathcw/ (no sources available?) sleef by Naoki Shibata, 2010 (simd lib for evaluating elementary functions) http://shibatch.sourceforge.net/ crlibm by ens-lyon, 2004-2010 http://lipforge.ens-lyon.fr/www/crlibm/ see: crlibm.pdf at http://lipforge.ens-lyon.fr/frs/?group_id=8&release_id=123 see: "Elementary Functions" by Jean-Michel Muller 2005 see: "Handbook of Floating-Point Arithmetic" by (many authors from Lyon) 2009 (closed: intel libm, hp libm for itanium, ..) libm tests ---------- http://www.netlib.org/fp/ucbtest.tgz http://www.jhauser.us/arithmetic/TestFloat.html http://cant.ua.ac.be/old/ieeecc754.html http://www.loria.fr/~zimmerma/mpcheck/ http://gforge.inria.fr/projects/mpcheck/ http://people.inf.ethz.ch/gonnet/FPAccuracy/Analysis.html http://www.math.utah.edu/~beebe/software/ieee/ http://www.vinc17.org/research/testlibm/index.en.html http://www.vinc17.org/research/fptest.en.html tests in crlibm multiprecision libs (useful for tests) http://mpfr.org/ http://www.multiprecision.org/index.php?prog=mpc http://pari.math.u-bordeaux.fr/ http://www.apfloat.org/apfloat/ http://code.google.com/p/fastfunlib/ scs_lib in crlibm other ----- ieee standard: http://754r.ucbtest.org/ extended precision issues: http://www.vinc17.org/research/extended.en.html correctly rounded mult: http://perso.ens-lyon.fr/jean-michel.muller/MultConstant.html table based sincos: http://perso.ens-lyon.fr/damien.stehle/IMPROVEDGAL.html finding worst cases: http://perso.ens-lyon.fr/damien.stehle/WCLR.html finding worst cases: http://perso.ens-lyon.fr/damien.stehle/DECIMALEXP.html finding worst cases: http://www.loria.fr/equipes/spaces/slz.en.html finding worst cases: http://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm generating libm functions: http://lipforge.ens-lyon.fr/www/metalibm/ fast conversion to fixedpoint: http://stereopsis.com/sree/fpu2006.html double-double, quad-double arithmetics: http://crd-legacy.lbl.gov/~dhbailey/mpdist/ papers by kahan: http://www.cs.berkeley.edu/~wkahan/ fp paper by goldberg: http://download.oracle.com/docs/cd/E19957-01/806-3568/ncg_goldberg.html math functions in general: http://dlmf.nist.gov/