nsz Git - www/blob - libm/index.html

   1 <html><head><title>libm</title></head><body>
   2 <h2>libm</h2>
   3
   4 <p>This page is about designing libm for the
   5 <a href="http://www.etalabs.net/musl/">musl</a> libc.
   6 <p>Writing the math code from scratch is a huge work
   7 so already existing code is used.
   8
   9 <ul>
  10 <li><a href="#sources">Sources</a>
  11 <li><a href="#issues">Design issues</a>
  12 <li><a href="#representation">Representation</a>
  13 <li><a href="#ugly">Ugly</a>
  14 <li><a href="#implementations">libm implementations</a>
  15 <li><a href="#tests">libm tests</a>
  16 </ul>
  17
  18 <h3><a name="sources" href="#sources">Sources</a></h3>
  19 <p>The math code is mostly from freebsd which in turn
  20 is based on <a href="http://www.netlib.org/fdlibm/">fdlibm</a>.
  21 <p>sources:
  22 <ul>
  23 <li>freebsd /lib/msun (<a href="http://svnweb.FreeBSD.org/base/head/lib/msun/">browse</a>)
  24 <pre>
  25 svn checkout svn://svn.freebsd.org/base/head/lib/msun
  26 </pre>
  27 <li>openbsd /src/lib/libm (<a href="http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libm/">browse</a>)
  28 <pre>
  29 cvs -d <a href="http://openbsd.org/anoncvs.html#CVSROOT">$CVSROOT</a> get src/lib/libm
  30 </pre>
  31 </ul>
  32
  33 <h3><a name="issues" href="#issues">Design issues</a></h3>
  34 <p>The math code is available but there are still many design questions.
  35 <ul>
  36 <li>Code organization, build system tweaks:
  37 <p>There are architecture specific code,
  38 long double representation specific code
  39 and complex code.
  40 <p>With respect to long double, bsd libm code can handle
  41 ld64, ld80 and ld128 (where ld64 means long double == double),
  42 for musl it's enough to support ld64 and ld80, but
  43 keeping ld128 shouldn't hurt much either.
  44 <p>So far musl had arch specific code and common code,
  45 it seems a new category is needed:
  46 long double specific code, which can be ld64, ld80 or ld128.
  47 These should be tied to the architecture in the build system
  48 somehow. In freebsd most of the long double code is
  49 common (using arch specific macros, ifdefs) and only a
  50 small part of the ld code is maintained separately for ld80 and ld128,
  51 but it's also possible to keep all ld code separately (with duplications)
  52 in ld80 and ld128.
  53 <p>Complex support is optional in c99 so maybe it should
  54 be optional in the musl build as well. It also can be
  55 kept separately in cmath/ or with the rest of the code in math/.
  56 <p>Pending questions:
  57 <ul>
  58 <li>Do we want ld128?
  59 <li>Should we try to use common code for ld80 and ld128?
  60 <li>How to do ld64: wrap double functions or alias them?
  61 <li>How to tie the ld* code to the arch in the build system?
  62 <li>Make complex optional?
  63 <li>Keep complex in math/ or cmath/?
  64 </ul>
  65
  66 <li>Workarounds
  67 <p>The bsd libm code has many workarounds for various
  68 compiler issues. It's not clear what's the best way
  69 to handle the uglyness.
  70 <p>Pending questions:
  71 <ul>
  72 <li>Use STDC pragmas (eventhough gcc does not support them)?
  73 <li>Use volatile consistently to avoid evaluation precision and const folding issues?
  74 <li>Use special compiler flags against unwanted optimization (-frounding-math, -ffloat-store)?
  75 <li>Do inline/macro optimization for small functions? (isnan, isinf, signbit, creal, cimag,..)
  76 <li>In complex code prefer creal(), cimag() or a union to (un)pack re,im?
  77 </ul>
  78
  79 <li>Code cleanups
  80 <p>The fdlibm code style is inconsistent with musl (and ugly)
  81 so it makes sense to clean it up, but keeping easy diffability
  82 can be useful. There are various inconsistencies and
  83 correctness issues in the code which might worth addressing
  84 (eg. reliance on signed overflow should be eliminated).
  85 <p>Pending questions:
  86 <ul>
  87 <li>Keep diffability with freebsd/openbsd code or reformat the code for clarity?
  88 <li>Keep e_, s_, k_  fdlibm source file name prefixes?
  89 <li>Should 0x1p0 float format be preferred over decimal format?
  90 <li>Should isnan, signbit,.. be preferred over inplace bithacks?
  91 <li>Is unpacking a double into 2 int32_t ok (signed int representation)?
  92 <li>Is unpacking the mantissa of ld80 into an int64_t ok?
  93 </ul>
  94 </ul>
  95
  96 <h3><a name="representation" href="#representation">Representation</a></h3>
  97 <p>
  98 Binary representation of floating point numbers matter
  99 because lots of bithacks are needed in the math code.
 100 <p>
 101 float and double bit manipulation can be handled
 102 in a portable way in c:
 103 <ul>
 104 <li>float is ieee binary32
 105 <li>double is ieee binary64
 106 </ul>
 107 (and old non-standard representations are not supported)
 108 <p>
 109 The endianness may still vary, but that can be worked
 110 around by using a union with a single large enough
 111 unsigned int. (The only exception is probably arm/oabi fpa
 112 where the word order of a double is not consistent with the
 113 endianness [<a href="http://wiki.debian.org/ArmEabiPort#ARM_floating_points">debian wiki on arm</a>],
 114 but we probably won't support that)
 115 <p>
 116 long double bit manipulation is harder as there are
 117 various representations:
 118 <ul>
 119 <li>ld64: long double is the same as double (ieee binary64)
 120 <li>ld80: 80bit extended precision format [<a href="https://en.wikipedia.org/wiki/Extended_precision">wikipedia</a>]
 121 <li>ld128: quadruple precision, ieee binary128 [<a href="https://en.wikipedia.org/wiki/Quadruple-precision_floating-point_format">wikipedia</a>]
 122 </ul>
 123 (and other non-standard formats are not supported)
 124 <p>
 125 ld64 is easy to handle: all long double functions
 126 are aliased to the corresponding double one
 127 (long double is treated equivalently to double)
 128 <p>
 129 ld80 is the most common (i386, x86_64), it means
 130 64bit significand with explicit msb (inconsistent with other ieee formats),
 131 15bit exp, 1 sign bit.
 132 <p>
 133 ld128 is rare (sparc64 with software emulation), it means
 134 113bit significand with implicit msb, 15bit exp, 1 sign bit.
 135 <p>
 136 Endianness can vary (although on the supported i386 and x86_64 it is the same)
 137 and there is no large enough unsigned int to handle it.
 138 (In case of ld80 internal padding can vary as well, eg
 139 m68k and m88k cpus use different ld80 than the intel ones)
 140
 141
 142 <h3><a name="ugly" href="#ugly">Ugly</a></h3>
 143 <p>The ugly parts of libm hacking.
 144 <p>Some notes are from:
 145 <a href="http://www.vinc17.org/research/extended.en.html">http://www.vinc17.org/research/extended.en.html</a>
 146
 147
 148 <ul>
 149 <li>Double rounding:
 150 <p>
 151 If a value rounded twice the result can be different
 152 than rounding just once.
 153 <p>
 154 The C language allows arithmetic to be evaluated in
 155 higher precision than the operands have. If we
 156 use x87 fpu in extended precision mode it will round
 157 the results twice: round to 80bit when calculating
 158 and then round to 64bit when storing it, this can
 159 give different result than a single 64bit rounding.
 160 (on x86-linux the default fpu setting is to round the
 161 results in extended precision, this only affects x87 instructions, not see2 etc)
 162 (afaik freebsd and openbsd use double precision by default)
 163 <p>
 164 So x = a+b may give different results depending on
 165 the x87 fpu precision setting.
 166 (only happens in round to nearest rounding mode,
 167 but that's the most common one)
 168 <p>
 169 (double rounding can happen with float vs double as well)
 170 <p>
 171 C99 annex F prohibits double rounding,
 172 but that's non-normative.
 173
 174 <li>Wider exponent range (x87 issue):
 175 <p>
 176 Even if the fpu is set to double precision
 177 (which is not) the x87 registers use wider exponent
 178 range (mant:exp is 53:15 instead of 53:11 bits)
 179 so underflows (subnormals) may not be treated
 180 as expected. Rounding to double only occurs
 181 when a value is stored into memory.
 182 <p>
 183 Actually this beahviour is allowed by the ieee 754
 184 standard, but it can cause problems (see
 185 <a href="http://www.exploringbinary.com/php-hangs-on-numeric-value-2-2250738585072011e-308/">infamous php bug</a>)
 186
 187 <li>Evaluation precision:
 188 <p>
 189 C does not require consistent evaluation
 190 precision: the compiler may store intermediate
 191 results and round them to double while keep
 192 other parts in higher precision.
 193 So
 194 <pre>
 195 (a+b)==(a+b)
 196 </pre>
 197 may be false when the two sides
 198 are kept in different precision.
 199 (This is not an x87 specific problem, it matters whenever there
 200 is a higher precision fp type than the currently used one.
 201 It goes away if the highest precision (long double) is used
 202 everywhere, but that can have a huge penalty).
 203 <p>
 204 C99 has a way to control this (see
 205 <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#5.1.2.3">5.1.2.3 example 4</a>,
 206 <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#6.3.1.5">6.3.1.5</a> and
 207 <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#6.3.1.8">6.3.1.8</a>):
 208 when a result is stored in a variable or a
 209 type cast is used, then it is guaranteed that
 210 the precision is appropriate to that type.
 211 So in
 212 <pre>
 213 (double)(a+b)==(double)(a+b)
 214 </pre>
 215 both sides are guaranteed to be in double precision
 216 when the comparision is done.
 217 <p>
 218 (This still does not solve the x87 double rounding
 219 issue though: eg if the left side is evaluated with
 220 sse2 and the right side with x87 extended precision setting
 221 and double rounding then the result may be false
 222 and still conformant to C99)
 223 <p>
 224 Unfortunately gcc does not respect the standard
 225 and even if assingment or cast is used the result
 226 may be kept in higher precision
 227 (infamous <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323">gcc bug323</a>
 228 also see <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36578">gcc bug36578</a>).
 229 gcc 4.5 fixed it with '-fexcess-precision=standard'
 230 (it is enabled by '-std=c99', but the default is
 231 '-fexcess-precision=fast')
 232 <p>
 233 The workaround for older gcc is to force the
 234 compiler to store the intermediate results:
 235 by using volatile double temporary variables
 236 or by '-ffloat-store' (slow, and of course
 237 all intermediate results should be assigned
 238 to some variable, casting is not enough).
 239 <p>
 240 (Sometimes the excess precision is good
 241 but it's hard to rely on it as it is optional
 242 <p>
 243 There is a way to check for it though using
 244 FLT_EVAL_METHOD, float_t and double_t.
 245 But it's probably easier to unconditionally
 246 use higher precision variables when that's
 247 what we want and don't depend on the implicit
 248 excess precision).
 249
 250 <li>Compiler optimizations:
 251 <p>
 252 Runtime and compile time semantics may be different
 253 gcc often does unwanted or even invalid compile
 254 time optimizations.
 255 (constant folding where floating point exception
 256 flags should be raised at runtime, or the result
 257 depends on runtime floating point environment,
 258 or constant expressions are just evaluated with
 259 different precision than at runtime).
 260 <p>
 261 C99 actually allows most of these optimizations
 262 but they can be turned off with STDC pragmas (see
 263 <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#6.10.6">6.10.6</a>).
 264 Unfortunately <a href="http://gcc.gnu.org/c99status.html">gcc does not support these pragmas</a>.
 265 <p>
 266 FENV_ACCESS ON tells the compiler that the code wants
 267 to access the floating point environment (eg. set different rounding mode)
 268 (see <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34678">gcc bug34678</a>).
 269 <p>
 270 (see <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37845">gcc bug37845</a> for FP_CONTRACT pragma).
 271 <p>
 272 The workaround is again using named volatile
 273 variables for constants like
 274 <pre>
 275 static const volatile two52 = 0x1p52;
 276 </pre>
 277 and using the '-frounding-math' gcc flag.
 278 <p>
 279 (According the freebsd libm code gcc truncates
 280 long double const literals on i386.
 281 I haven't yet verified if this still the case,
 282 but as a workaround sometimes double-double arithmetics is used:
 283 initializing the long double constant from two doubles)
 284
 285 <li>ld80 vs ld128
 286 <p>
 287 The two representations are sufficiently different
 288 that treating them together is awkward.
 289 <p>
 290 In the freebsd libm code a few architecture specific
 291 macros and a union handle these issues, but the
 292 result is often less clear than treating ld80
 293 and ld128 separately.
 294
 295 <li>Signed int
 296 <p>
 297 The freebsd libm code has many inconsistencies
 298 (naming conventions, 0x1p0 notation vs decimal notation,..),
 299 one of them is the integer type used for bitmanipulations:
 300 The bits of a double are unpacked into one of
 301 int32_t, uint32_t and u_int32_t
 302 integer types.
 303 <p>
 304 int32_t is used most often which is wrong because of
 305 implementation defined signed int representation.
 306 <p>
 307 In general signed int is not handled carefully
 308 in the libm code: scalbn even depends on signed int overflow.
 309
 310 <li>Complex arithmetics
 311 <p>
 312 gcc turns
 313 <pre>
 314 x*I
 315 </pre>
 316 into
 317 <pre>
 318 (x+0*I)*(0+I) = 0*x + x*I
 319 </pre>
 320 So if x=inf then the result is nan+inf*I instead of inf*I
 321 (an fmul instruction is generated for 0*x)
 322 <p>
 323 (There were various complex constant folding issues as well in gcc).
 324 <p>
 325 So a+b*I cannot be used to create complex numbers,
 326 instead some double[2] manipulation should be used to
 327 make sure there is no fmul in the generated code
 328 (freebsd uses a static inline cpack function)
 329 (It's not clear which is better: using union hacks
 330 everywhere in the complex code or creal, cimag, cpack)
 331 <p>
 332 Complex code is hard to make portable across compilers
 333 as there are no good compiler support, it is rarely used
 334 and there are various options available in the standard:
 335 complex support is optional, imaginary type
 336 is optional.
 337 <p>
 338 It's not clear how I (or _Complex_I) should be defined
 339 in complex.h
 340 (literal of the float imaginary unit is compiler specific,
 341 in gcc it can be 1.0fi).
 342 </ul>
 343
 344 <h3><a name="implementations" href="#implementations">libm implementations</a></h3>
 345 <ul>
 346 <li>
 347 unix v7 libm by Robert Morris (rhm) around 1979<br>
 348 see: "Computer Approximations" by Hart & Cheney, 1968<br>
 349 used in: plan9, go
 350 <li>
 351 cephes by Stephen L. Moshier, 1984-1992<br>
 352 <a href="http://www.netlib.org/cephes/">http://www.netlib.org/cephes/</a><br>
 353 see: "Methods and Programs for Mathematical Functions" by Moshier 1989
 354 <li>
 355 fdlibm by K. C. Ng around 1993, updated in 1995?<br>
 356 (1993: copyright notice: "Developed at SunPro ..")<br>
 357 (1995: copyright notice: "Developed at SunSoft ..")<br>
 358 eg see: "Argument Reduction for Huge Arguments: Good to the Last Bit" by Ng 1992<br>
 359 <a href="http://www.netlib.org/fdlibm">http://www.netlib.org/fdlibm</a><br>
 360 used in: netbsd, openbsd, freebsd, bionic, musl
 361 <li>
 362 libultim by Abraham Ziv around 2001 (lgpl)<br>
 363 (also known as the ibm accurate portable math lib)<br>
 364 see: "Fast evaluation of elementary mathematical functions with correctly rounded last bit" by Ziv 1991<br>
 365 <a href="http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/">http://web.archive.org/web/20050207110205/http://oss.software.ibm.com/mathlib/</a><br>
 366 used in: glibc
 367 <li>
 368 libmcr by K. C. Ng, 2004<br>
 369 (sun's correctly rounded libm)
 370 <li>
 371 mathcw by Nelson H. F. Beebe, 2010?<br>
 372 <a href="http://ftp.math.utah.edu/pub/mathcw/">http://ftp.math.utah.edu/pub/mathcw/</a><br>
 373 (no sources available?)
 374 <li>
 375 sleef by Naoki Shibata, 2010<br>
 376 (simd lib for evaluating elementary functions)<br>
 377 <a href="http://shibatch.sourceforge.net/">http://shibatch.sourceforge.net/</a>
 378 <li>
 379 crlibm by ens-lyon, 2004-2010<br>
 380 <a href="http://lipforge.ens-lyon.fr/www/crlibm/">http://lipforge.ens-lyon.fr/www/crlibm/</a><br>
 381 see: <a href="http://lipforge.ens-lyon.fr/frs/?group_id=8&amp;release_id=123">crlibm.pdf</a><br>
 382 see: "Elementary Functions" by Jean-Michel Muller 2005<br>
 383 see: "Handbook of Floating-Point Arithmetic" by (many authors from Lyon) 2009<br>
 384 <li>
 385 various other closed ones: intel libm, hp libm for itanium,..
 386 </ul>
 387
 388 <h3><a name="tests" href="#tests">libm tests</a></h3>
 389 <ul>
 390 <li><a href="http://www.netlib.org/fp/ucbtest.tgz">ucbtest.tgz</a>
 391 <li><a href="http://www.jhauser.us/arithmetic/TestFloat.html">TestFloat</a>
 392 <li><a href="http://cant.ua.ac.be/old/ieeecc754.html">ieeecc754</a>
 393 <li><a href="http://www.loria.fr/~zimmerma/mpcheck/">mpcheck</a>,
 394 <a href="http://gforge.inria.fr/projects/mpcheck/">mpcheck</a>
 395 <li><a href="http://people.inf.ethz.ch/gonnet/FPAccuracy/Analysis.html">FPAccuracy</a>
 396 <li><a href="http://www.math.utah.edu/~beebe/software/ieee/">beebe's ieee tests</a>
 397 <li><a href="http://www.vinc17.org/research/testlibm/index.en.html">testlibm</a>
 398 <li><a href="http://www.vinc17.org/research/fptest.en.html">fptest</a>
 399 <li>tests in crlibm
 400 </ul>
 401 <p>
 402 multiprecision libs (useful for tests)
 403 <ul>
 404 <li><a href="http://mpfr.org/">mpfr</a>
 405 <li><a href="http://www.multiprecision.org/index.php?prog=mpc">mpc</a>
 406 <li><a href="http://pari.math.u-bordeaux.fr/">pari</a>
 407 <li><a href="http://www.apfloat.org/apfloat/">apfloat</a>
 408 <li><a href="http://code.google.com/p/fastfunlib/">fastfunlib</a>
 409 <li>scs_lib in crlibm
 410 </ul>
 411
 412 <p>other links
 413 <ul>
 414 <li>ieee standard: http://754r.ucbtest.org/
 415 <li>extended precision issues: http://www.vinc17.org/research/extended.en.html
 416 <li>correctly rounded mult: http://perso.ens-lyon.fr/jean-michel.muller/MultConstant.html
 417 <li>table based sincos: http://perso.ens-lyon.fr/damien.stehle/IMPROVEDGAL.html
 418 <li>finding worst cases: http://perso.ens-lyon.fr/damien.stehle/WCLR.html
 419 <li>finding worst cases: http://perso.ens-lyon.fr/damien.stehle/DECIMALEXP.html
 420 <li>finding worst cases: http://www.loria.fr/equipes/spaces/slz.en.html
 421 <li>finding worst cases: http://perso.ens-lyon.fr/jean-michel.muller/Intro-to-TMD.htm
 422 <li>generating libm functions: http://lipforge.ens-lyon.fr/www/metalibm/
 423 <li>fast conversion to fixedpoint: http://stereopsis.com/sree/fpu2006.html
 424 <li>double-double, quad-double arithmetics: http://crd-legacy.lbl.gov/~dhbailey/mpdist/
 425 <li>papers by kahan: http://www.cs.berkeley.edu/~wkahan/
 426 <li>fp paper by goldberg: http://download.oracle.com/docs/cd/E19957-01/806-3568/ncg_goldberg.html
 427 <li>math functions in general: http://dlmf.nist.gov/
 428 </ul>
 429
 430 <p><small>(<a href="/git/?p=www">page history</a>, <a href="/">home</a>)</small>
 431 </body></html>