X-Git-Url: http://nsz.repo.hu/git/?p=www;a=blobdiff_plain;f=libm.txt;h=226243bbca52747af14017a8cc72900f91e20285;hp=7d2d44d5cad0f0ff74ce8733c5da8930d21bfe36;hb=HEAD;hpb=8096c8147b13da7d8c1a1b216cba3d00582b6e20 diff --git a/libm.txt b/libm.txt index 7d2d44d..226243b 100644 --- a/libm.txt +++ b/libm.txt @@ -1,399 +1 @@ -sources -------- - -fdlibm sources -(freebsd is most up to date probably, but some code is -only available from openbsd) - -freebsd /lib/msun -svn checkout svn://svn.freebsd.org/base/head/lib/msun - -openbsd /src/lib/libm -cvs -d anoncvs@anoncvs.eu.openbsd.org:/cvs get src/lib/libm - - -code organization ------------------ - -math code needs some special code organization -or build system tweaks - -* long double: -(representation issues are below) -code organization constraints: -1.) internal .h to bit manipulate float and double -2.) long double representation specific .h -3.) long double representation specific .c -4.) common long double code for all representations -5.) no long double code when long double == double -6.) arch specific .s - -1. can be put in src/internal -2. and 3. can be handled like arch specific code -(separate directories for different long double -representations and then somehow hook them up -in the build system) -4. can be regular source files in math/ but needs -ifdefs due to 5. -all double functions need an extra wrapper/alias -to solve 5. -6. can be done the usual way - -dir tree: -/include/math.h -/include/complex.h -/arch/ld80/ldouble.h - duplicate per arch ? -/arch/ld128/ldouble.h -/src/internal/libm.h -/src/math/ld80/*l.c -/src/math/ld128/*l.c -/src/math/arm/*.s -/src/math/i386/*.s -/src/math/x86_64/*.s -/src/math/*.c - -in freebsd most of the long double code is common -to the different representations but there are some -representation specific files in ld80/ and ld128/ - -it is possible to have all the long double code -duplicated and maintained separately in ld80/ and -ld128/ (or just ld80/ as ld128/ is rarely needed), -the implementations always have a few differences -(eventhough most of the logic is the same and the -constants only differ in precision etc) -so 4. can be mostly eliminated -or have no separate ld80/ and ld128/ but use ifdefs -and macros in the *l.c codes -os 3. can be eliminated - -* complex: -it makes sense to make complex code optional -(compiler might not support it) -this needs some hacks: ifdefed sources or -some buildsystem hack - -* code style and naming: -fdlibm has many ugly badly indented code -it makes sense to clean them up a bit if we -touch them anyway, but it's also good if the -code can be easily compared to other fdlibms.. - -fdlibm uses e_ s_ k_ w_ source code prefixes -(i guess they stand for ieee, standard c, kernel and -wrapper functions, but i'm not sure, i'd rather drop -these if fdlibm diffability was not important) - - -representation --------------- - -float and double bit manipulation can be handled -in a portable way in c - -float is ieee binary32 -double is ieee binary64 -(assuming old non-standard representations are not supported) - -the endianness may still vary, but that can be worked -around by using a union with a single large enough -unsigned int - -long double bit manipulation is harder as there are -various representations: -1.) same as double: ieee binary64 -2.) ld80: 80bit extended precision format -3.) ld128: quadruple precision, ieee binary128 -(assuming other non-standard formats are not supported) - -case 1. is easy to handle: all long double functions -are aliased to the corresponding double one -(long double is treated equivalently to double) - -case 2. is the most common (i386, x86_64, emulation) -case 3. is rare and means software emulation (sparc64) - -ld80 means 64bit mant, explicit msb, 15bit exp, 1 sign bit -ld128 means 113bit mant, implicit msb, 15bit exp, 1 sign bit - -endianness can vary (and there is no large enough unsigned -int to handle it), in case of ld80 padding can vary as well -(eg the 80bit float of m68k and m88k cpus has different -padding than the intel one) - -supporting ld80 and ld128 with the same code can be done, -but requres ifdefs and macro hacks, treating them separately -means many code duplications - - -ugly ----- - -some notes are from: -http://www.vinc17.org/research/extended.en.html - -1.) double rounding -the c language allows arithmetic to be evaluated in -higher precision and on x86 linux the default fpu -setting is to round the results in extended precision -(only x87 instructions are affected, sse2 is not) -(freebsd and openbsd use double precision by default) - -rounding a result in extended precision first then -rounding it to double when storing it may give -different results than a single rounding to double -so x = a+b may give different results depending on -the fpu setting -(only happens in round to nearest rounding mode, -but that's the most common) - -2.) wider exponent range -even if the fpu is set to double precision -(which is not) the x87 registers use wider exponent -range so underflows (subnormals) may not be treated -according to the ieee standard (unless every -computation result is stored and rounded to double) - -3.) precision of intermediate results -c does not require consistent evaluation -precision: the compiler may store intermediate -results and round them to double while keep -other parts in extended precision -so (a+b)==(a+b) may be false when the two sides -are kept in different precision - -c99 has a way to control this: -when a result is stored in a variable or a -type cast is used, then it is guaranteed that -the precision is appropriate to that type -so in (double)(a+b)==(double)(a+b) both -sides are guaranteed to be in double precision -when the comparision is done -(this still does not solve 1. though: eg if -the left side is evaluated with sse2 and the -right side with x87 then the result may be false) - -unfortunately gcc does not respect the standard -(infamous gcc 323 bug) -gcc 4.5 fixed it with '-fexcess-precision=standard' -(it is enabled by '-std=c99', but the default is -'-fexcess-precision=fast') - -the workaround for older gcc is to force the -compiler to store the intermediate results: -by using volatile double temporary variables -or by '-ffloat-store' (slow) - -(sometimes the excess precision is good -but it's hard to rely on it as it is optional - -there is a way to check for it though using -FLT_EVAL_METHOD, float_t and double_t - -but even then it's probably easier to -unconditionally cast the variables of an -expression to a higher precision when -that's what we want and don't depend -on the implicit excess precision) - -4.) runtime vs compile time -runtime and compile time semantics may be different -gcc often does unwanted or even invalid compile -time optimizations -(constant folding where floating point exception -flags should be raised at runtime, or the result -depends on runtime floating point environment, -or constant expressions are just evaluated with -different precision than at runtime) - -the workaround is again using named volatile -variables like -static const volatile two52 = 0x1p52; - -(?) old gcc truncated long double const expressions -on x86, the workaround is to put the constant -together from doubles at runtime - -5.) ld80 vs ld128 bit representations (see above) -the two representations are sufficiently different -that treating them together is awkward - -ld80 does not have implicit msb and the mantissa is -best handled by a single uint64_t -ld128 has implicit msb and the mantissa cannot be -handled in a single unsigned int - -(typical operations: -get/set exp and sign bit (same for ld80 and ld128) -check x==0, mant==0 (ld80: explicit bit should be masked) -check |x| < 1, 0.5 (exp + mant manipulation) -check if subnormal -check if integral) - -in the freebsd fdlibm code a few architecture specific -macros and a union handle these issues - -6.) signed int -the fdlibm code has many inconsistencies -naming conventions, 0x1p0 notation vs decimal notation, -..and integer type used for bitmanipulation - -when the bits of a double is manipulated it is unpacked -into two 32bit words, which can be -int32_t, uint32_t, u_int32_t - -int32_t is used most often which is wrong because of -implementation defined signed int representation -(in general signed int is not handled carefully -scalbn even depends on signed int overflow) - -7.) complex is not implemented in gcc properly - -x*I turns into (x+0*I)*(0+I) = 0*x + x*I -so if x=inf then the result is nan+inf*I instead of inf*I -(an fmul instruction is generated for 0*x) - -so a+b*I cannot be used to create complex numbers -instead some double[2] manipulation should be used to -make sure there is no fmul in the generated code -(freebsd uses a static inline cpack function) - -it's not clear which is better: using union hacks -everywhere in the complex code or creal, cimag, cpack - -union dcomplex { - double complex z; - double a[2]; -}; -// for efficiency these should be macros or inline functions -#define creal(z) ((double)(z)) -#define cimag(z) ((union dcomplex){(z)}.a[1]) -#define cpack(x,y) ((union dcomplex){.a={(x),(y)}}.z) - -eg. various implementations for conj: - -// proper implementation, but gcc does not do the right thing -double complex conj(double complex z) { - return creal(z) - cimag(z)*I; -} - -// using cpack -double complex conj(double complex z) { - return cpack(creal(z), -cimag(z)); -} - -// explicit work around -double complex conj(double complex z) { - union dcomplex u = {z}; - - u.a[1] = -u.a[1]; - return u.z; -} - -8.) complex code is hard to make portable across compilers - -eg it's not clear how I (or _Complex_I) should be defined -in complex.h -(literal of the float imaginary unit is compiler specific, -in gcc it can be 1.0fi) - -complex support is not mandatory in c99 conformant -implementations and even if complex is supported -there might or might not be separate imaginary type - - -libm implementations --------------------- - -unix v7 libm by Robert Morris (rhm) around 1979 -see: "Computer Approximations" by Hart & Cheney, 1968 -used in: plan9, go - -cephes by Stephen L. Moshier, 1984-1992 -http://www.netlib.org/cephes/ -see: "Methods and Programs for Mathematical Functions" by Moshier 1989 - -fdlibm by K. C. Ng around 1993, updated in 1995? -(1993: copyright notice: "Developed at SunPro ..") -(1995: copyright notice: "Developed at SunSoft ..") -see: "Argument Reduction for Huge Arguments: Good to the Last Bit" by Ng 1992 -http://www.netlib.org/fdlibm -used in: netbsd, openbsd, freebsd, bionic, musl - -libultim by Abraham Ziv around 2001 (gpl) -(also known as the ibm accurate portable math lib) -see: "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991 -http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/ -used in: glibc - -libmcr by K. C. Ng, 2004 -(sun's correctly rounded libm) - -mathcw by Nelson H. F. Beebe, 2010? -http://ftp.math.utah.edu/pub/mathcw/ -(no sources available?) - -sleef by Naoki Shibata, 2010 -(simd lib for evaluating elementary functions) -http://shibatch.sourceforge.net/ - -crlibm by ens-lyon, 2004-2010 -http://lipforge.ens-lyon.fr/www/crlibm/ -see: crlibm.pdf at http://lipforge.ens-lyon.fr/frs/?group_id=8&release_id=123 -see: "Elementary Functions" by Jean-Michel Muller 2005 -see: "Handbook of Floating-Point Arithmetic" by (many authors from Lyon) 2009 - -(closed: intel libm, hp libm for itanium, ..) - - -libm tests ----------- - -http://www.netlib.org/fp/ucbtest.tgz -http://www.jhauser.us/arithmetic/TestFloat.html -http://cant.ua.ac.be/old/ieeecc754.html -http://www.loria.fr/~zimmerma/mpcheck/ -http://gforge.inria.fr/projects/mpcheck/ -http://people.inf.ethz.ch/gonnet/FPAccuracy/Analysis.html -http://www.math.utah.edu/~beebe/software/ieee/ -http://www.vinc17.org/research/testlibm/index.en.html -http://www.vinc17.org/research/fptest.en.html -tests in crlibm - -multiprecision libs (useful for tests) -http://mpfr.org/ -http://www.multiprecision.org/index.php?prog=mpc -http://pari.math.u-bordeaux.fr/ -http://www.apfloat.org/apfloat/ -http://code.google.com/p/fastfunlib/ -scs_lib in crlibm - - -other ------ - -ieee standard: http://754r.ucbtest.org/ -extended precision issues: http://www.vinc17.org/research/extended.en.html -correctly rounded mult: http://perso.ens-lyon.fr/jean-michel.muller/MultConstant.html -table based sincos: http://perso.ens-lyon.fr/damien.stehle/IMPROVEDGAL.html -finding worst cases: http://perso.ens-lyon.fr/damien.stehle/WCLR.html -finding worst cases: http://perso.ens-lyon.fr/damien.stehle/DECIMALEXP.html -finding worst cases: http://www.loria.fr/equipes/spaces/slz.en.html -finding worst cases: http://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm -generating libm functions: http://lipforge.ens-lyon.fr/www/metalibm/ -fast conversion to fixedpoint: http://stereopsis.com/sree/fpu2006.html -double-double, quad-double arithmetics: http://crd-legacy.lbl.gov/~dhbailey/mpdist/ -papers by kahan: http://www.cs.berkeley.edu/~wkahan/ -fp paper by goldberg: http://download.oracle.com/docs/cd/E19957-01/806-3568/ncg_goldberg.html -math functions in general: http://dlmf.nist.gov/ - - - - - - - - - - - +see http://nsz.repo.hu/libm