nsz Git - www/blob - libm.txt

   1 sources
   2 -------
   3
   4 fdlibm sources
   5 (freebsd is most up to date probably, but some code is
   6 only available from openbsd)
   7
   8 freebsd /lib/msun
   9 svn checkout svn://svn.freebsd.org/base/head/lib/msun
  10
  11 openbsd /src/lib/libm
  12 cvs -d anoncvs@anoncvs.eu.openbsd.org:/cvs get src/lib/libm
  13
  14
  15 code organization
  16 -----------------
  17
  18 math code needs some special code organization
  19 or build system tweaks
  20
  21 * long double:
  22 (representation issues are below)
  23 code organization constraints:
  24 1.) internal .h to bit manipulate float and double
  25 2.) long double representation specific .h
  26 3.) long double representation specific .c
  27 4.) common long double code for all representations
  28 5.) no long double code when long double == double
  29 6.) arch specific .s
  30
  31 1. can be put in src/internal
  32 2. and 3. can be handled like arch specific code
  33 (separate directories for different long double
  34 representations and then somehow hook them up
  35 in the build system)
  36 4. can be regular source files in math/ but needs
  37 ifdefs due to 5.
  38 all double functions need an extra wrapper/alias
  39 to solve 5.
  40 6. can be done the usual way
  41
  42 dir tree:
  43 /include/math.h
  44 /include/complex.h
  45 /arch/ld80/ldouble.h - duplicate per arch ?
  46 /arch/ld128/ldouble.h
  47 /src/internal/libm.h
  48 /src/math/ld80/*l.c
  49 /src/math/ld128/*l.c
  50 /src/math/arm/*.s
  51 /src/math/i386/*.s
  52 /src/math/x86_64/*.s
  53 /src/math/*.c
  54
  55 in freebsd most of the long double code is common
  56 to the different representations but there are some
  57 representation specific files in ld80/ and ld128/
  58
  59 it is possible to have all the long double code
  60 duplicated and maintained separately in ld80/ and
  61 ld128/ (or just ld80/ as ld128/ is rarely needed),
  62 the implementations always have a few differences
  63 (eventhough most of the logic is the same and the
  64 constants only differ in precision etc)
  65 so 4. can be mostly eliminated
  66 or have no separate ld80/ and ld128/ but use ifdefs
  67 and macros in the *l.c codes
  68 os 3. can be eliminated
  69
  70 * complex:
  71 it makes sense to make complex code optional
  72 (compiler might not support it)
  73 this needs some hacks: ifdefed sources or
  74 some buildsystem hack
  75
  76 * code style and naming:
  77 fdlibm has many ugly badly indented code
  78 it makes sense to clean them up a bit if we
  79 touch them anyway, but it's also good if the
  80 code can be easily compared to other fdlibms..
  81
  82 fdlibm uses e_ s_ k_ w_ source code prefixes
  83 (i guess they stand for ieee, standard c, kernel and
  84 wrapper functions, but i'm not sure, i'd rather drop
  85 these if fdlibm diffability was not important)
  86
  87
  88 representation
  89 --------------
  90
  91 float and double bit manipulation can be handled
  92 in a portable way in c
  93
  94 float is ieee binary32
  95 double is ieee binary64
  96 (assuming old non-standard representations are not supported)
  97
  98 the endianness may still vary, but that can be worked
  99 around by using a union with a single large enough
 100 unsigned int
 101
 102 long double bit manipulation is harder as there are
 103 various representations:
 104 1.) same as double: ieee binary64
 105 2.) ld80: 80bit extended precision format
 106 3.) ld128: quadruple precision, ieee binary128
 107 (assuming other non-standard formats are not supported)
 108
 109 case 1. is easy to handle: all long double functions
 110 are aliased to the corresponding double one
 111 (long double is treated equivalently to double)
 112
 113 case 2. is the most common (i386, x86_64, emulation)
 114 case 3. is rare and means software emulation (sparc64)
 115
 116 ld80 means 64bit mant, explicit msb, 15bit exp, 1 sign bit
 117 ld128 means 113bit mant, implicit msb, 15bit exp, 1 sign bit
 118
 119 endianness can vary (and there is no large enough unsigned
 120 int to handle it), in case of ld80 padding can vary as well
 121 (eg the 80bit float of m68k and m88k cpus has different
 122 padding than the intel one)
 123
 124 supporting ld80 and ld128 with the same code can be done,
 125 but requres ifdefs and macro hacks, treating them separately
 126 means many code duplications
 127
 128
 129 ugly
 130 ----
 131
 132 some notes are from:
 133 http://www.vinc17.org/research/extended.en.html
 134
 135 1.) double rounding
 136 the c language allows arithmetic to be evaluated in
 137 higher precision and on x86 linux the default fpu
 138 setting is to round the results in extended precision
 139 (only x87 instructions are affected, sse2 is not)
 140 (freebsd and openbsd use double precision by default)
 141
 142 rounding a result in extended precision first then
 143 rounding it to double when storing it may give
 144 different results than a single rounding to double
 145 so x = a+b may give different results depending on
 146 the fpu setting
 147 (only happens in round to nearest rounding mode,
 148 but that's the most common)
 149
 150 2.) wider exponent range
 151 even if the fpu is set to double precision
 152 (which is not) the x87 registers use wider exponent
 153 range so underflows (subnormals) may not be treated
 154 according to the ieee standard (unless every
 155 computation result is stored and rounded to double)
 156
 157 3.) precision of intermediate results
 158 c does not require consistent evaluation
 159 precision: the compiler may store intermediate
 160 results and round them to double while keep
 161 other parts in extended precision
 162 so (a+b)==(a+b) may be false when the two sides
 163 are kept in different precision
 164
 165 c99 has a way to control this:
 166 when a result is stored in a variable or a
 167 type cast is used, then it is guaranteed that
 168 the precision is appropriate to that type
 169 so in (double)(a+b)==(double)(a+b) both
 170 sides are guaranteed to be in double precision
 171 when the comparision is done
 172 (this still does not solve 1. though: eg if
 173 the left side is evaluated with sse2 and the
 174 right side with x87 then the result may be false)
 175
 176 unfortunately gcc does not respect the standard
 177 (infamous gcc 323 bug)
 178 gcc 4.5 fixed it with '-fexcess-precision=standard'
 179 (it is enabled by '-std=c99', but the default is
 180 '-fexcess-precision=fast')
 181
 182 the workaround for older gcc is to force the
 183 compiler to store the intermediate results:
 184 by using volatile double temporary variables
 185 or by '-ffloat-store' (slow)
 186
 187 (sometimes the excess precision is good
 188 but it's hard to rely on it as it is optional
 189
 190 there is a way to check for it though using
 191 FLT_EVAL_METHOD, float_t and double_t
 192
 193 but even then it's probably easier to
 194 unconditionally cast the variables of an
 195 expression to a higher precision when
 196 that's what we want and don't depend
 197 on the implicit excess precision)
 198
 199 4.) runtime vs compile time
 200 runtime and compile time semantics may be different
 201 gcc often does unwanted or even invalid compile
 202 time optimizations
 203 (constant folding where floating point exception
 204 flags should be raised at runtime, or the result
 205 depends on runtime floating point environment,
 206 or constant expressions are just evaluated with
 207 different precision than at runtime)
 208
 209 the workaround is again using named volatile
 210 variables like
 211 static const volatile two52 = 0x1p52;
 212
 213 (?) old gcc truncated long double const expressions
 214 on x86, the workaround is to put the constant
 215 together from doubles at runtime
 216
 217 5.) ld80 vs ld128 bit representations (see above)
 218 the two representations are sufficiently different
 219 that treating them together is awkward
 220
 221 ld80 does not have implicit msb and the mantissa is
 222 best handled by a single uint64_t
 223 ld128 has implicit msb and the mantissa cannot be
 224 handled in a single unsigned int
 225
 226 (typical operations:
 227 get/set exp and sign bit (same for ld80 and ld128)
 228 check x==0, mant==0 (ld80: explicit bit should be masked)
 229 check |x| < 1, 0.5 (exp + mant manipulation)
 230 check if subnormal
 231 check if integral)
 232
 233 in the freebsd fdlibm code a few architecture specific
 234 macros and a union handle these issues
 235
 236 6.) signed int
 237 the fdlibm code has many inconsistencies
 238 naming conventions, 0x1p0 notation vs decimal notation,
 239 ..and integer type used for bitmanipulation
 240
 241 when the bits of a double is manipulated it is unpacked
 242 into two 32bit words, which can be
 243 int32_t, uint32_t, u_int32_t
 244
 245 int32_t is used most often which is wrong because of
 246 implementation defined signed int representation
 247 (in general signed int is not handled carefully
 248 scalbn even depends on signed int overflow)
 249
 250 7.) complex is not implemented in gcc properly
 251
 252 x*I turns into (x+0*I)*(0+I) = 0*x + x*I
 253 so if x=inf then the result is nan+inf*I instead of inf*I
 254 (an fmul instruction is generated for 0*x)
 255
 256 so a+b*I cannot be used to create complex numbers
 257 instead some double[2] manipulation should be used to
 258 make sure there is no fmul in the generated code
 259 (freebsd uses a static inline cpack function)
 260
 261 it's not clear which is better: using union hacks
 262 everywhere in the complex code or creal, cimag, cpack
 263
 264 union dcomplex {
 265         double complex z;
 266         double a[2];
 267 };
 268 // for efficiency these should be macros or inline functions
 269 #define creal(z) ((double)(z))
 270 #define cimag(z) ((union dcomplex){(z)}.a[1])
 271 #define cpack(x,y) ((union dcomplex){.a={(x),(y)}}.z)
 272
 273 eg. various implementations for conj:
 274
 275 // proper implementation, but gcc does not do the right thing
 276 double complex conj(double complex z) {
 277         return creal(z) - cimag(z)*I;
 278 }
 279
 280 // using cpack
 281 double complex conj(double complex z) {
 282         return cpack(creal(z), -cimag(z));
 283 }
 284
 285 // explicit work around
 286 double complex conj(double complex z) {
 287         union dcomplex u = {z};
 288
 289         u.a[1] = -u.a[1];
 290         return u.z;
 291 }
 292
 293 8.) complex code is hard to make portable across compilers
 294
 295 eg it's not clear how I (or _Complex_I) should be defined
 296 in complex.h
 297 (literal of the float imaginary unit is compiler specific,
 298 in gcc it can be 1.0fi)
 299
 300 complex support is not mandatory in c99 conformant
 301 implementations and even if complex is supported
 302 there might or might not be separate imaginary type
 303
 304
 305 libm implementations
 306 --------------------
 307
 308 unix v7 libm by Robert Morris (rhm) around 1979
 309 see: "Computer Approximations" by Hart & Cheney, 1968
 310 used in: plan9, go
 311
 312 cephes by Stephen L. Moshier, 1984-1992
 313 http://www.netlib.org/cephes/
 314 see: "Methods and Programs for Mathematical Functions" by Moshier 1989
 315
 316 fdlibm by K. C. Ng around 1993, updated in 1995?
 317 (1993: copyright notice: "Developed at SunPro ..")
 318 (1995: copyright notice: "Developed at SunSoft ..")
 319 see: "Argument Reduction for Huge Arguments: Good to the Last Bit" by Ng 1992
 320 http://www.netlib.org/fdlibm
 321 used in: netbsd, openbsd, freebsd, bionic, musl
 322
 323 libultim by Abraham Ziv around 2001 (gpl)
 324 (also known as the ibm accurate portable math lib)
 325 see: "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991
 326 http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/
 327 used in: glibc
 328
 329 libmcr by K. C. Ng, 2004
 330 (sun's correctly rounded libm)
 331
 332 mathcw by Nelson H. F. Beebe, 2010?
 333 http://ftp.math.utah.edu/pub/mathcw/
 334 (no sources available?)
 335
 336 sleef by Naoki Shibata, 2010
 337 (simd lib for evaluating elementary functions)
 338 http://shibatch.sourceforge.net/
 339
 340 crlibm by ens-lyon, 2004-2010
 341 http://lipforge.ens-lyon.fr/www/crlibm/
 342 see: crlibm.pdf at http://lipforge.ens-lyon.fr/frs/?group_id=8&release_id=123
 343 see: "Elementary Functions" by Jean-Michel Muller 2005
 344 see: "Handbook of Floating-Point Arithmetic" by (many authors from Lyon) 2009
 345
 346 (closed: intel libm, hp libm for itanium, ..)
 347
 348
 349 libm tests
 350 ----------
 351
 352 http://www.netlib.org/fp/ucbtest.tgz
 353 http://www.jhauser.us/arithmetic/TestFloat.html
 354 http://cant.ua.ac.be/old/ieeecc754.html
 355 http://www.loria.fr/~zimmerma/mpcheck/
 356 http://gforge.inria.fr/projects/mpcheck/
 357 http://people.inf.ethz.ch/gonnet/FPAccuracy/Analysis.html
 358 http://www.math.utah.edu/~beebe/software/ieee/
 359 http://www.vinc17.org/research/testlibm/index.en.html
 360 http://www.vinc17.org/research/fptest.en.html
 361 tests in crlibm
 362
 363 multiprecision libs (useful for tests)
 364 http://mpfr.org/
 365 http://www.multiprecision.org/index.php?prog=mpc
 366 http://pari.math.u-bordeaux.fr/
 367 http://www.apfloat.org/apfloat/
 368 http://code.google.com/p/fastfunlib/
 369 scs_lib in crlibm
 370
 371
 372 other
 373 -----
 374
 375 ieee standard: http://754r.ucbtest.org/
 376 extended precision issues: http://www.vinc17.org/research/extended.en.html
 377 correctly rounded mult: http://perso.ens-lyon.fr/jean-michel.muller/MultConstant.html
 378 table based sincos: http://perso.ens-lyon.fr/damien.stehle/IMPROVEDGAL.html
 379 finding worst cases: http://perso.ens-lyon.fr/damien.stehle/WCLR.html
 380 finding worst cases: http://perso.ens-lyon.fr/damien.stehle/DECIMALEXP.html
 381 finding worst cases: http://www.loria.fr/equipes/spaces/slz.en.html
 382 finding worst cases: http://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm
 383 generating libm functions: http://lipforge.ens-lyon.fr/www/metalibm/
 384 fast conversion to fixedpoint: http://stereopsis.com/sree/fpu2006.html
 385 double-double, quad-double arithmetics: http://crd-legacy.lbl.gov/~dhbailey/mpdist/
 386 papers by kahan: http://www.cs.berkeley.edu/~wkahan/
 387 fp paper by goldberg: http://download.oracle.com/docs/cd/E19957-01/806-3568/ncg_goldberg.html
 388 math functions in general: http://dlmf.nist.gov/
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399