nsz Git - www/blob - libm/index.html

   1 <html><head><title>libm</title></head><body>
   2 <h2>libm</h2>
   3
   4 <p>This page is about designing libm for the
   5 <a href="http://www.etalabs.net/musl/">musl</a> libc.
   6 <p>Writing the math code from scratch is a huge work
   7 so already existing code is used.
   8
   9 <ul>
  10 <li><a href="#branch">Branch</a>
  11 <li><a href="#sources">Sources</a>
  12 <li><a href="#issues">Design issues</a>
  13 <li><a href="#representation">Representation</a>
  14 <li><a href="#ugly">Ugly</a>
  15 <li><a href="#implementations">libm implementations</a>
  16 <li><a href="#tests">libm tests</a>
  17 </ul>
  18
  19 <h3><a name="branch" href="#branch">Branch</a></h3>
  20 <p>musl branch for math hacks:
  21 <ul>
  22 <li><a href="/git/?p=musl">browse</a> with gitweb
  23 <li>over git (recommended)
  24 <pre>
  25 git clone git://nsz.repo.hu:45100/repo/musl
  26 </pre>
  27 <li>over http (slower)
  28 <pre>
  29 git clone http://nsz.repo.hu/repo/musl
  30 </pre>
  31 <li>over ssh (passwd is anon)
  32 <pre>
  33 git clone ssh://anon@nsz.repo.hu:45022/repo/musl
  34 </pre>
  35 </ul>
  36
  37 <h3><a name="sources" href="#sources">Sources</a></h3>
  38 <p>The math code is mostly from freebsd which in turn
  39 is based on <a href="http://www.netlib.org/fdlibm/">fdlibm</a>.
  40 <p>sources:
  41 <ul>
  42 <li>freebsd /lib/msun (<a href="http://svnweb.FreeBSD.org/base/head/lib/msun/">browse</a>)
  43 <pre>
  44 svn checkout svn://svn.freebsd.org/base/head/lib/msun
  45 </pre>
  46 <li>openbsd /src/lib/libm (<a href="http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libm/">browse</a>)
  47 <pre>
  48 cvs -d <a href="http://openbsd.org/anoncvs.html#CVSROOT">$CVSROOT</a> get src/lib/libm
  49 </pre>
  50 </ul>
  51
  52 <h3><a name="issues" href="#issues">Design issues</a></h3>
  53
  54 <li>Workarounds
  55 <p>The bsd libm code has many workarounds for various
  56 compiler issues. It's not clear what's the best way
  57 to handle the uglyness.
  58 <p>In long double code fpu is assumed to be set to extended precision
  59 <p>Pending questions:
  60 <ul>
  61 <li>Use STDC pragmas (eventhough gcc does not support them)?
  62 <li>Where should we use volatile (to avoid evaluation precision and const folding issues)?
  63 <li>What tricks to use for proper exception raising? (huge*huge, etc)
  64 <li>Should signaling nans be considered?
  65 <li>Which compiler flags should be used (-frounding-math, -ffloat-store)?
  66 <li>TODO: inline/macro optimization for small functions (isnan, isinf, signbit, creal, cimag,..)
  67 </ul>
  68
  69 <li>Code cleanups
  70 <p>The fdlibm code has code style and correctness issues
  71 which might worth addressing.
  72 <p>Pending questions:
  73 <ul>
  74 <li>Can we use a polyeval(coeffs, x) function instead of writing out the poly?
  75 <li>TODO: prefer 0x1p0 format over decimal one
  76 <li>TODO: use 1.0f instead of (float)1 and const float one = 1;
  77 <li>TODO: use uint32_t instead of int32_t (to avoid signed int overflow and right shifts)
  78 <li>TODO: prefer isnan, signbit,.. over inplace bithacks
  79 </ul>
  80 </ul>
  81
  82 <h3><a name="representation" href="#representation">Representation</a></h3>
  83 <p>
  84 Binary representation of floating point numbers matter
  85 because lots of bithacks are needed in the math code.
  86 <p>
  87 float and double bit manipulation can be handled
  88 in a portable way in c:
  89 <ul>
  90 <li>float is ieee binary32
  91 <li>double is ieee binary64
  92 </ul>
  93 (<a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#F.2">C99 annex F</a>
  94 makes these mandatory)
  95 <p>
  96 The endianness may still vary, but that can be worked
  97 around by using a union with a single large enough
  98 unsigned int. (The only exception is probably arm/oabi fpa
  99 where the word order of a double is not consistent with the
 100 endianness [<a href="http://wiki.debian.org/ArmEabiPort#ARM_floating_points">debian wiki on arm</a>],
 101 but we probably won't support that)
 102 <p>
 103 long double bit manipulation is harder as there are
 104 various representations:
 105 <ul>
 106 <li>ld64: long double is the same as double (ieee binary64)
 107 <li>ld80: 80bit extended precision format [<a href="https://en.wikipedia.org/wiki/Extended_precision">wikipedia</a>]
 108 <li>ld128: quadruple precision, ieee binary128 [<a href="https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format">wikipedia</a>]
 109 </ul>
 110 (and other non-standard formats are not supported)
 111 <p>
 112 ld64 is easy to handle: all long double functions
 113 are just wrappers around the corresponding double ones
 114 (aliasing is non-conformant)
 115 <p>
 116 ld80 is the most common (i386, x86_64), it means
 117 64bit significand with explicit msb (inconsistent with other ieee formats),
 118 15bit exp, 1 sign bit.
 119 <p>
 120 ld128 is rare (sparc64 with software emulation), it means
 121 113bit significand with implicit msb, 15bit exp, 1 sign bit.
 122 <p>
 123 Endianness can vary (although on the supported i386 and x86_64 it is the same)
 124 and there is no large enough unsigned int to handle it.
 125 (In case of ld80 internal padding can vary as well, eg
 126 m68k and m88k cpus use different ld80 than the intel ones.)
 127 So there can be further variations in the binary representation
 128 than just ld80 and ld128.
 129
 130
 131 <h3><a name="ugly" href="#ugly">Ugly</a></h3>
 132 <p>The ugly parts of libm hacking.
 133 <p>Some notes are from:
 134 <a href="http://www.vinc17.org/research/extended.en.html">http://www.vinc17.org/research/extended.en.html</a>
 135
 136 <ul>
 137 <li>Double rounding:
 138 <p>
 139 If a value rounded twice the result can be different
 140 than rounding just once.
 141 <p>
 142 The C language allows arithmetic to be evaluated in
 143 higher precision than the operands have. If we
 144 use x87 fpu in extended precision mode it will round
 145 the results twice: round to 80bit when calculating
 146 and then round to 64bit when storing it, this can
 147 give different result than a single 64bit rounding.
 148 (on x86-linux the default fpu setting is to round the
 149 results in extended precision, this only affects x87 instructions, not see2 etc)
 150 (freebsd and openbsd use double precision by default)
 151 <p>
 152 So x = a+b may give different results depending on
 153 the x87 fpu precision setting.
 154 (only happens in round to nearest rounding mode,
 155 but that's the most common one)
 156 <p>
 157 (double rounding can happen with float vs double as well)
 158 <p>
 159 <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#F.7.3">C99 annex F</a>
 160 prohibits double rounding, but that's non-normative.
 161
 162 <li>Wider exponent range (x87 issue):
 163 <p>
 164 Even if the fpu is set to double precision
 165 (which is not) the x87 registers use wider exponent
 166 range (mant:exp is 53:15 instead of 53:11 bits)
 167 so underflows (subnormals) may not be treated
 168 as expected. Rounding to double only occurs
 169 when a value is stored into memory.
 170 <p>
 171 Actually this beahviour is allowed by the ieee 754
 172 standard, but it can cause problems (see
 173 <a href="http://www.exploringbinary.com/php-hangs-on-numeric-value-2-2250738585072011e-308/">infamous php bug</a>)
 174
 175 <li>Evaluation precision:
 176 <p>
 177 C does not require consistent evaluation
 178 precision: the compiler may store intermediate
 179 results and round them to double while keep
 180 other parts in higher precision.
 181 (And the precision the compiler choses can
 182 be inconsistent: adding a printf to the code
 183 may change the result of a nearby calculation).
 184 So
 185 <pre>
 186 (a+b)==(a+b)
 187 </pre>
 188 may be false when the two sides
 189 are kept in different precision.
 190 (This is not an x87 specific problem, it matters whenever there
 191 is a higher precision fp type than the currently used one.
 192 It goes away if the highest precision (long double) is used
 193 everywhere, but that can have a huge penalty).
 194 <p>
 195 C99 has a way to control this (see
 196 <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#5.1.2.3">5.1.2.3 example 4</a>,
 197 <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#6.3.1.5">6.3.1.5</a> and
 198 <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#6.3.1.8">6.3.1.8</a>):
 199 when a result is stored in a variable or a
 200 type cast is used, then it is guaranteed that
 201 the precision is appropriate to that type.
 202 So in
 203 <pre>
 204 (double)(a+b)==(double)(a+b)
 205 </pre>
 206 both sides are guaranteed to be in double precision
 207 when the comparision is done.
 208 <p>
 209 (This still does not solve the x87 double rounding
 210 issue though: eg if the left side is evaluated with
 211 sse2 and the right side with x87 extended precision setting
 212 and double rounding then the result may still be false)
 213 <p>
 214 Unfortunately gcc does not respect the standard
 215 and even if assingment or cast is used the result
 216 may be kept in higher precision
 217 (infamous <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323">gcc bug323</a>
 218 also see <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36578">gcc bug36578</a>).
 219 gcc 4.5 fixed it with '-fexcess-precision=standard'
 220 (it is enabled by '-std=c99', but the default is
 221 '-fexcess-precision=fast')
 222 <p>
 223 The workaround for older gcc is to force the
 224 compiler to store the intermediate results:
 225 by using volatile double temporary variables
 226 or by '-ffloat-store' (which affects all
 227 intermediate results, but is not guaranteed
 228 by the gcc docs to always work).
 229 <p>
 230 (Sometimes the excess precision is good
 231 but it's hard to rely on it as it is optional
 232 <p>
 233 There is a way to check for it though using
 234 FLT_EVAL_METHOD, float_t and double_t.
 235 But it's probably easier to unconditionally
 236 use higher precision variables when that's
 237 what we want and don't depend on the implicit
 238 excess precision).
 239
 240 <li>Compiler optimizations:
 241 <p>
 242 Runtime and compile time semantics may be different
 243 gcc often does unwanted or even invalid compile
 244 time optimizations.
 245 (constant folding where floating point exception
 246 flags should be raised at runtime, or the result
 247 depends on runtime floating point environment,
 248 or constant expressions are just evaluated with
 249 different precision than at runtime).
 250 <p>
 251 C99 actually allows most of these optimizations
 252 but they can be turned off with STDC pragmas (see
 253 <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#6.10.6">6.10.6</a>).
 254 Unfortunately <a href="http://gcc.gnu.org/c99status.html">gcc does not support these pragmas</a>.
 255 <p>
 256 FENV_ACCESS ON tells the compiler that the code wants
 257 to access the floating point environment (eg. set different rounding mode)
 258 (see <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678">gcc bug34678</a>).
 259 <p>
 260 (see <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37845">gcc bug37845</a> for FP_CONTRACT pragma
 261 which is relevant when the architecture has fma instruction, x86_64, i386, arm do not have it).
 262 <p>
 263 The workaround is again using named volatile
 264 variables for constants like
 265 <pre>
 266 static const volatile two52 = 0x1p52;
 267 </pre>
 268 and using the '-frounding-math' gcc flag.
 269 <p>
 270 (According the freebsd libm code gcc truncates
 271 long double const literals on i386.
 272 I haven't yet verified if this still the case,
 273 but as a workaround double-double arithmetics is used:
 274 initializing the long double constant from two doubles)
 275
 276 <li>ld80 vs ld128
 277 <p>
 278 The two representations are sufficiently different
 279 that treating them together is awkward.
 280 (Especially the explicit msb bit in ld80 can cause
 281 different behaviour).
 282 <p>
 283 In the freebsd libm code a few architecture specific
 284 macros and a union handle these issues, but the
 285 result is often less clear than treating ld80
 286 and ld128 separately.
 287
 288 <li>Signed zeros
 289 <p>Signed zeros can be tricky. They cannot be checked
 290 using the usual comparision operators (+0.0 == -0.0 is true),
 291 but they give different results in some cases
 292 (1/-0.0 == -inf) and can make similarly looking
 293 expressions different eg.
 294 (x + 0) is not the same as x, the former is +0.0 when x is -0.0
 295 <p>
 296 (To check for -0, the signbit macro can be used
 297 or the copysign function or bit manipulation.)
 298
 299 <li>Error handling
 300 <p>
 301 Arithmetics may set various floating point exception flags as a side effect.
 302 These can be queried and manipulated (fetestexcept, feraiseexcept,..).
 303 <p>
 304 So special care is needed
 305 when a library function wants to avoid changing
 306 the floating point status flags.
 307 eg. if one wants to check for -0 silently then
 308 <pre>
 309 if (x == 0.0 &amp;&amp; 1/x &lt; 0) { /* x is a -0 */ }
 310 </pre>
 311 is not ok: == raises invalid exception when x is nan,
 312 and even if we filter nans out 1/x will raise the
 313 divbyzero flag.
 314 <p>
 315 When a library wants to raise a flag deliberately
 316 but feraiseexcept is not available for some reason,
 317 then simple arithmetics can be be used just for their
 318 exception raising side effect
 319 (eg. 1/0.0 to raise divbyzero), however beaware
 320 of compiler optimizations (dead code elimination,..).
 321 <p>
 322 Unfortunately gcc does not always take fp exceptions into
 323 account: a simple x = 1e300*1e300; may not raise overflow
 324 exception at runtime, but get optimized into x = +inf.
 325 see compiler optimizations above.
 326 <p>
 327 Another x87 gcc bug related to fp exceptions is that
 328 comparision operators (==, &lt;, etc) don't raise invalid
 329 when an operand is nan
 330 (eventhough this is required by ieee + c99 annex F).
 331 (see <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52451">gcc bug52451</a>).
 332 <p>
 333 The ieee standard defines signaling and quite nan
 334 floating-point numbers as well.
 335 The c99 standard only considers quiet nan, but it allows
 336 signaling nans to be supported as well.
 337 Without signaling nans x * 1 is equivalent to x,
 338 but if signaling nan is supported then the former
 339 raises an invalid exception.
 340 This may complicates things further if one wants to write
 341 portable fp math code.
 342 <p>
 343 A further libm design issue is the math_errhandling macro:
 344 it specifies the way math function errors can be checked
 345 which is either fp exceptions or the errno variable or both.
 346 The fp exception approach can be supported with enough care
 347 but errno is hard to support: certain library functions
 348 are implemented as a single asm instruction (eg sqrt),
 349 the only way to set errno is to query the fp exception flags
 350 and then set the errno variable based on that.
 351 So eventhough errno may be convenient in libm it is
 352 not the right thing to do.
 353
 354 <li>Complex arithmetics
 355 <p>
 356 gcc turns
 357 <pre>
 358 x*I
 359 </pre>
 360 into
 361 <pre>
 362 (x+0*I)*(0+I) = 0*x + x*I
 363 </pre>
 364 So if x=inf then the result is nan+inf*I instead of inf*I
 365 (an fmul instruction is generated for 0*x)
 366 <p>
 367 (There were various complex constant folding issues as well in gcc).
 368 <p>
 369 So a+b*I cannot be used to create complex numbers,
 370 instead some double[2] manipulation should be used to
 371 make sure there is no fmul in the generated code
 372 (freebsd uses a static inline cpack function)
 373 (It's not clear which is better: using union hacks
 374 everywhere in the complex code or creal, cimag, cpack)
 375 <p>
 376 Complex code is hard to make portable across compilers
 377 as there are no good compiler support, it is rarely used
 378 and there are various options available in the standard:
 379 complex support is optional, imaginary type
 380 is optional.
 381 <p>
 382 It's not clear how I (or _Complex_I) should be defined
 383 in complex.h
 384 (literal of the float imaginary unit is compiler specific,
 385 in gcc it can be 1.0fi).
 386
 387 <li>Signed int
 388 <p>
 389 The freebsd libm code has many inconsistencies
 390 (naming conventions, 0x1p0 notation vs decimal notation,..),
 391 one of them is the integer type used for bitmanipulations:
 392 The bits of a double are unpacked into one of
 393 int32_t, uint32_t and u_int32_t
 394 integer types.
 395 <p>
 396 int32_t is used most often which is wrong because of
 397 implementation defined signed int representation.
 398 <p>
 399 In general signed int is not handled carefully
 400 in the libm code: scalbn even depends on signed int overflow.
 401 </ul>
 402
 403 <h3><a name="implementations" href="#implementations">libm implementations</a></h3>
 404 <ul>
 405 <li>
 406 unix v7 libm by Robert Morris (rhm) around 1979<br>
 407 see: "Computer Approximations" by Hart & Cheney, 1968<br>
 408 used in: plan9, go
 409 <li>
 410 cephes by Stephen L. Moshier, 1984-1992<br>
 411 <a href="http://www.netlib.org/cephes/">http://www.netlib.org/cephes/</a><br>
 412 see: "Methods and Programs for Mathematical Functions" by Moshier 1989
 413 <li>
 414 fdlibm by K. C. Ng around 1993, updated in 1995?<br>
 415 (1993: copyright notice: "Developed at SunPro ..")<br>
 416 (1995: copyright notice: "Developed at SunSoft ..")<br>
 417 eg see: "Argument Reduction for Huge Arguments: Good to the Last Bit" by Ng 1992<br>
 418 <a href="http://www.netlib.org/fdlibm">http://www.netlib.org/fdlibm</a><br>
 419 used in: netbsd, openbsd, freebsd, bionic, musl
 420 <li>
 421 libultim by Abraham Ziv around 2001 (lgpl)<br>
 422 (also known as the ibm accurate portable math lib)<br>
 423 see: "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991<br>
 424 <a href="http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/">http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/</a><br>
 425 used in: glibc
 426 <li>
 427 libmcr by K. C. Ng, 2004<br>
 428 (sun's correctly rounded libm)
 429 <li>
 430 mathcw by Nelson H. F. Beebe, 2010?<br>
 431 <a href="http://ftp.math.utah.edu/pub/mathcw/">http://ftp.math.utah.edu/pub/mathcw/</a><br>
 432 (no sources available?)
 433 <li>
 434 sleef by Naoki Shibata, 2010<br>
 435 (simd lib for evaluating elementary functions)<br>
 436 <a href="http://shibatch.sourceforge.net/">http://shibatch.sourceforge.net/</a>
 437 <li>
 438 crlibm by ens-lyon, 2004-2010<br>
 439 <a href="http://lipforge.ens-lyon.fr/www/crlibm/">http://lipforge.ens-lyon.fr/www/crlibm/</a><br>
 440 see: <a href="http://lipforge.ens-lyon.fr/frs/?group_id=8&amp;release_id=123">crlibm.pdf</a><br>
 441 see: "Elementary Functions" by Jean-Michel Muller 2005<br>
 442 see: "Handbook of Floating-Point Arithmetic" by (many authors from Lyon) 2009<br>
 443 <li>
 444 various other closed ones: intel libm, hp libm for itanium,..
 445 </ul>
 446
 447 <h3><a name="tests" href="#tests">libm tests</a></h3>
 448 <ul>
 449 <li><a href="http://www.netlib.org/fp/ucbtest.tgz">ucbtest.tgz</a>
 450 <li><a href="http://www.jhauser.us/arithmetic/TestFloat.html">TestFloat</a>
 451 <li><a href="http://cant.ua.ac.be/old/ieeecc754.html">ieeecc754</a>
 452 <li><a href="http://www.loria.fr/~zimmerma/mpcheck/">mpcheck</a>,
 453 <a href="http://gforge.inria.fr/projects/mpcheck/">mpcheck</a>
 454 <li><a href="http://people.inf.ethz.ch/gonnet/FPAccuracy/Analysis.html">FPAccuracy</a>
 455 <li><a href="http://www.math.utah.edu/~beebe/software/ieee/">beebe's ieee tests</a>
 456 <li><a href="http://www.math.utah.edu/pub/elefunt/">elefunt</a>
 457 <li><a href="http://www.vinc17.org/research/testlibm/index.en.html">testlibm</a>
 458 <li><a href="http://www.vinc17.org/research/fptest.en.html">fptest</a>
 459 <li>tests in crlibm
 460 </ul>
 461 <p>
 462 multiprecision libs (useful for tests)
 463 <ul>
 464 <li><a href="http://mpfr.org/">mpfr</a>
 465 <li><a href="http://www.multiprecision.org/index.php?prog=mpc">mpc</a>
 466 <li><a href="http://pari.math.u-bordeaux.fr/">pari</a>
 467 <li><a href="http://www.apfloat.org/apfloat/">apfloat</a>
 468 <li><a href="http://code.google.com/p/fastfunlib/">fastfunlib</a>
 469 <li>scs_lib in crlibm
 470 </ul>
 471
 472 <p>other links
 473 <ul>
 474 <li>ieee standard: http://754r.ucbtest.org/
 475 <li>extended precision issues: http://www.vinc17.org/research/extended.en.html
 476 <li>correctly rounded mult: http://perso.ens-lyon.fr/jean-michel.muller/MultConstant.html
 477 <li>table based sincos: http://perso.ens-lyon.fr/damien.stehle/IMPROVEDGAL.html
 478 <li>finding worst cases: http://perso.ens-lyon.fr/damien.stehle/WCLR.html
 479 <li>finding worst cases: http://perso.ens-lyon.fr/damien.stehle/DECIMALEXP.html
 480 <li>finding worst cases: http://www.loria.fr/equipes/spaces/slz.en.html
 481 <li>finding worst cases: http://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm
 482 <li>generating libm functions: http://lipforge.ens-lyon.fr/www/metalibm/
 483 <li>fast conversion to fixedpoint: http://stereopsis.com/sree/fpu2006.html
 484 <li>double-double, quad-double arithmetics: http://crd-legacy.lbl.gov/~dhbailey/mpdist/
 485 <li>papers by kahan: http://www.cs.berkeley.edu/~wkahan/
 486 <li>fp paper by goldberg: http://download.oracle.com/docs/cd/E19957-01/806-3568/ncg_goldberg.html
 487 <li>math functions in general: http://dlmf.nist.gov/
 488 </ul>
 489
 490 <p><small>(<a href="/git/?p=www">page history</a>, <a href="/">home</a>)</small>
 491 </body></html>