X-Git-Url: http://nsz.repo.hu/git/?p=www;a=blobdiff_plain;f=libm.txt;h=226243bbca52747af14017a8cc72900f91e20285;hp=7d2d44d5cad0f0ff74ce8733c5da8930d21bfe36;hb=HEAD;hpb=8096c8147b13da7d8c1a1b216cba3d00582b6e20

diff --git a/libm.txt b/libm.txt
index 7d2d44d..226243b 100644
--- a/libm.txt
+++ b/libm.txt
@@ -1,399 +1 @@
-sources
--------
-
-fdlibm sources
-(freebsd is most up to date probably, but some code is
-only available from openbsd)
-
-freebsd /lib/msun
-svn checkout svn://svn.freebsd.org/base/head/lib/msun
-
-openbsd /src/lib/libm
-cvs -d anoncvs@anoncvs.eu.openbsd.org:/cvs get src/lib/libm
-
-
-code organization
------------------
-
-math code needs some special code organization
-or build system tweaks
-
-* long double:
-(representation issues are below)
-code organization constraints:
-1.) internal .h to bit manipulate float and double
-2.) long double representation specific .h
-3.) long double representation specific .c
-4.) common long double code for all representations
-5.) no long double code when long double == double
-6.) arch specific .s
-
-1. can be put in src/internal
-2. and 3. can be handled like arch specific code
-(separate directories for different long double
-representations and then somehow hook them up
-in the build system)
-4. can be regular source files in math/ but needs
-ifdefs due to 5.
-all double functions need an extra wrapper/alias
-to solve 5.
-6. can be done the usual way
-
-dir tree:
-/include/math.h
-/include/complex.h
-/arch/ld80/ldouble.h - duplicate per arch ?
-/arch/ld128/ldouble.h
-/src/internal/libm.h
-/src/math/ld80/*l.c
-/src/math/ld128/*l.c
-/src/math/arm/*.s
-/src/math/i386/*.s
-/src/math/x86_64/*.s
-/src/math/*.c
-
-in freebsd most of the long double code is common
-to the different representations but there are some
-representation specific files in ld80/ and ld128/
-
-it is possible to have all the long double code
-duplicated and maintained separately in ld80/ and
-ld128/ (or just ld80/ as ld128/ is rarely needed),
-the implementations always have a few differences
-(eventhough most of the logic is the same and the
-constants only differ in precision etc)
-so 4. can be mostly eliminated
-or have no separate ld80/ and ld128/ but use ifdefs
-and macros in the *l.c codes
-os 3. can be eliminated
-
-* complex:
-it makes sense to make complex code optional
-(compiler might not support it)
-this needs some hacks: ifdefed sources or
-some buildsystem hack
-
-* code style and naming:
-fdlibm has many ugly badly indented code
-it makes sense to clean them up a bit if we
-touch them anyway, but it's also good if the
-code can be easily compared to other fdlibms..
-
-fdlibm uses e_ s_ k_ w_ source code prefixes
-(i guess they stand for ieee, standard c, kernel and
-wrapper functions, but i'm not sure, i'd rather drop
-these if fdlibm diffability was not important)
-
-
-representation
---------------
-
-float and double bit manipulation can be handled
-in a portable way in c
-
-float is ieee binary32
-double is ieee binary64
-(assuming old non-standard representations are not supported)
-
-the endianness may still vary, but that can be worked
-around by using a union with a single large enough
-unsigned int
-
-long double bit manipulation is harder as there are
-various representations:
-1.) same as double: ieee binary64
-2.) ld80: 80bit extended precision format
-3.) ld128: quadruple precision, ieee binary128
-(assuming other non-standard formats are not supported)
-
-case 1. is easy to handle: all long double functions
-are aliased to the corresponding double one
-(long double is treated equivalently to double)
-
-case 2. is the most common (i386, x86_64, emulation)
-case 3. is rare and means software emulation (sparc64)
-
-ld80 means 64bit mant, explicit msb, 15bit exp, 1 sign bit
-ld128 means 113bit mant, implicit msb, 15bit exp, 1 sign bit
-
-endianness can vary (and there is no large enough unsigned
-int to handle it), in case of ld80 padding can vary as well
-(eg the 80bit float of m68k and m88k cpus has different
-padding than the intel one)
-
-supporting ld80 and ld128 with the same code can be done,
-but requres ifdefs and macro hacks, treating them separately
-means many code duplications
-
-
-ugly
-----
-
-some notes are from:
-http://www.vinc17.org/research/extended.en.html
-
-1.) double rounding
-the c language allows arithmetic to be evaluated in
-higher precision and on x86 linux the default fpu
-setting is to round the results in extended precision
-(only x87 instructions are affected, sse2 is not)
-(freebsd and openbsd use double precision by default)
-
-rounding a result in extended precision first then
-rounding it to double when storing it may give
-different results than a single rounding to double
-so x = a+b may give different results depending on
-the fpu setting
-(only happens in round to nearest rounding mode,
-but that's the most common)
-
-2.) wider exponent range
-even if the fpu is set to double precision
-(which is not) the x87 registers use wider exponent
-range so underflows (subnormals) may not be treated
-according to the ieee standard (unless every
-computation result is stored and rounded to double)
-
-3.) precision of intermediate results
-c does not require consistent evaluation
-precision: the compiler may store intermediate
-results and round them to double while keep
-other parts in extended precision
-so (a+b)==(a+b) may be false when the two sides
-are kept in different precision
-
-c99 has a way to control this:
-when a result is stored in a variable or a
-type cast is used, then it is guaranteed that
-the precision is appropriate to that type
-so in (double)(a+b)==(double)(a+b) both
-sides are guaranteed to be in double precision
-when the comparision is done
-(this still does not solve 1. though: eg if
-the left side is evaluated with sse2 and the
-right side with x87 then the result may be false)
-
-unfortunately gcc does not respect the standard
-(infamous gcc 323 bug)
-gcc 4.5 fixed it with '-fexcess-precision=standard'
-(it is enabled by '-std=c99', but the default is
-'-fexcess-precision=fast')
-
-the workaround for older gcc is to force the
-compiler to store the intermediate results:
-by using volatile double temporary variables
-or by '-ffloat-store' (slow)
-
-(sometimes the excess precision is good
-but it's hard to rely on it as it is optional
-
-there is a way to check for it though using
-FLT_EVAL_METHOD, float_t and double_t
-
-but even then it's probably easier to
-unconditionally cast the variables of an
-expression to a higher precision when
-that's what we want and don't depend
-on the implicit excess precision)
-
-4.) runtime vs compile time
-runtime and compile time semantics may be different
-gcc often does unwanted or even invalid compile
-time optimizations
-(constant folding where floating point exception
-flags should be raised at runtime, or the result
-depends on runtime floating point environment,
-or constant expressions are just evaluated with
-different precision than at runtime)
-
-the workaround is again using named volatile
-variables like
-static const volatile two52 = 0x1p52;
-
-(?) old gcc truncated long double const expressions
-on x86, the workaround is to put the constant
-together from doubles at runtime
-
-5.) ld80 vs ld128 bit representations (see above)
-the two representations are sufficiently different
-that treating them together is awkward
-
-ld80 does not have implicit msb and the mantissa is
-best handled by a single uint64_t
-ld128 has implicit msb and the mantissa cannot be
-handled in a single unsigned int
-
-(typical operations:
-get/set exp and sign bit (same for ld80 and ld128)
-check x==0, mant==0 (ld80: explicit bit should be masked)
-check |x| < 1, 0.5 (exp + mant manipulation)
-check if subnormal
-check if integral)
-
-in the freebsd fdlibm code a few architecture specific
-macros and a union handle these issues
-
-6.) signed int
-the fdlibm code has many inconsistencies
-naming conventions, 0x1p0 notation vs decimal notation,
-..and integer type used for bitmanipulation
-
-when the bits of a double is manipulated it is unpacked
-into two 32bit words, which can be
-int32_t, uint32_t, u_int32_t
-
-int32_t is used most often which is wrong because of
-implementation defined signed int representation
-(in general signed int is not handled carefully
-scalbn even depends on signed int overflow)
-
-7.) complex is not implemented in gcc properly
-
-x*I turns into (x+0*I)*(0+I) = 0*x + x*I
-so if x=inf then the result is nan+inf*I instead of inf*I
-(an fmul instruction is generated for 0*x)
-
-so a+b*I cannot be used to create complex numbers
-instead some double[2] manipulation should be used to
-make sure there is no fmul in the generated code
-(freebsd uses a static inline cpack function)
-
-it's not clear which is better: using union hacks
-everywhere in the complex code or creal, cimag, cpack
-
-union dcomplex {
-	double complex z;
-	double a[2];
-};
-// for efficiency these should be macros or inline functions
-#define creal(z) ((double)(z))
-#define cimag(z) ((union dcomplex){(z)}.a[1])
-#define cpack(x,y) ((union dcomplex){.a={(x),(y)}}.z)
-
-eg. various implementations for conj:
-
-// proper implementation, but gcc does not do the right thing
-double complex conj(double complex z) {
-	return creal(z) - cimag(z)*I;
-}
-
-// using cpack
-double complex conj(double complex z) {
-	return cpack(creal(z), -cimag(z));
-}
-
-// explicit work around
-double complex conj(double complex z) {
-	union dcomplex u = {z};
-
-	u.a[1] = -u.a[1];
-	return u.z;
-}
-
-8.) complex code is hard to make portable across compilers
-
-eg it's not clear how I (or _Complex_I) should be defined
-in complex.h
-(literal of the float imaginary unit is compiler specific,
-in gcc it can be 1.0fi)
-
-complex support is not mandatory in c99 conformant
-implementations and even if complex is supported
-there might or might not be separate imaginary type
-
-
-libm implementations
---------------------
-
-unix v7 libm by Robert Morris (rhm) around 1979
-see: "Computer Approximations" by Hart & Cheney, 1968
-used in: plan9, go
-
-cephes by Stephen L. Moshier, 1984-1992
-http://www.netlib.org/cephes/
-see: "Methods and Programs for Mathematical Functions" by Moshier 1989
-
-fdlibm by K. C. Ng around 1993, updated in 1995?
-(1993: copyright notice: "Developed at SunPro ..")
-(1995: copyright notice: "Developed at SunSoft ..")
-see: "Argument Reduction for Huge Arguments: Good to the Last Bit" by Ng 1992
-http://www.netlib.org/fdlibm
-used in: netbsd, openbsd, freebsd, bionic, musl
-
-libultim by Abraham Ziv around 2001 (gpl)
-(also known as the ibm accurate portable math lib)
-see: "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991
-http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/
-used in: glibc
-
-libmcr by K. C. Ng, 2004
-(sun's correctly rounded libm)
-
-mathcw by Nelson H. F. Beebe, 2010?
-http://ftp.math.utah.edu/pub/mathcw/
-(no sources available?)
-
-sleef by Naoki Shibata, 2010
-(simd lib for evaluating elementary functions)
-http://shibatch.sourceforge.net/
-
-crlibm by ens-lyon, 2004-2010
-http://lipforge.ens-lyon.fr/www/crlibm/
-see: crlibm.pdf at http://lipforge.ens-lyon.fr/frs/?group_id=8&release_id=123
-see: "Elementary Functions" by Jean-Michel Muller 2005
-see: "Handbook of Floating-Point Arithmetic" by (many authors from Lyon) 2009
-
-(closed: intel libm, hp libm for itanium, ..)
-
-
-libm tests
-----------
-
-http://www.netlib.org/fp/ucbtest.tgz
-http://www.jhauser.us/arithmetic/TestFloat.html
-http://cant.ua.ac.be/old/ieeecc754.html
-http://www.loria.fr/~zimmerma/mpcheck/
-http://gforge.inria.fr/projects/mpcheck/
-http://people.inf.ethz.ch/gonnet/FPAccuracy/Analysis.html
-http://www.math.utah.edu/~beebe/software/ieee/
-http://www.vinc17.org/research/testlibm/index.en.html
-http://www.vinc17.org/research/fptest.en.html
-tests in crlibm
-
-multiprecision libs (useful for tests)
-http://mpfr.org/
-http://www.multiprecision.org/index.php?prog=mpc
-http://pari.math.u-bordeaux.fr/
-http://www.apfloat.org/apfloat/
-http://code.google.com/p/fastfunlib/
-scs_lib in crlibm
-
-
-other
------
-
-ieee standard: http://754r.ucbtest.org/
-extended precision issues: http://www.vinc17.org/research/extended.en.html
-correctly rounded mult: http://perso.ens-lyon.fr/jean-michel.muller/MultConstant.html
-table based sincos: http://perso.ens-lyon.fr/damien.stehle/IMPROVEDGAL.html
-finding worst cases: http://perso.ens-lyon.fr/damien.stehle/WCLR.html
-finding worst cases: http://perso.ens-lyon.fr/damien.stehle/DECIMALEXP.html
-finding worst cases: http://www.loria.fr/equipes/spaces/slz.en.html
-finding worst cases: http://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm
-generating libm functions: http://lipforge.ens-lyon.fr/www/metalibm/
-fast conversion to fixedpoint: http://stereopsis.com/sree/fpu2006.html
-double-double, quad-double arithmetics: http://crd-legacy.lbl.gov/~dhbailey/mpdist/
-papers by kahan: http://www.cs.berkeley.edu/~wkahan/
-fp paper by goldberg: http://download.oracle.com/docs/cd/E19957-01/806-3568/ncg_goldberg.html
-math functions in general: http://dlmf.nist.gov/
-
-
-
-
-
-
-
-
-
-
-
+see http://nsz.repo.hu/libm