libm

X-Git-Url: http://nsz.repo.hu/git/?a=blobdiff_plain;f=libm%2Findex.html;h=6eb718c2a3be826149d31ec3a4138889218c8b2d;hb=d29a46b37da36a35909c3f48a2717ee545f16c8e;hp=abdd7f844829c334dac042ff99a52f5adf65bdad;hpb=a1c9bbca9f66a10878d905a86d6d0d99797fd77d;p=www diff --git a/libm/index.html b/libm/index.html index abdd7f8..6eb718c 100644 --- a/libm/index.html +++ b/libm/index.html @@ -1,14 +1,12 @@ libm

libm

This page is about designing libm for the -musl libc. -

Writing the math code from scratch is a huge work -so already existing code is used. +

This page is about libm for the +musl libc.

Sources -
Design issues +
General rules
Representation
Ugly
libm implementations @@ -16,8 +14,13 @@ so already existing code is used.

Sources

The math code is mostly from freebsd which in turn -is based on fdlibm. +

Writing math code from scratch is a huge work so already existing code is +used. Several math functions are taken from the +freebsd libm and a few from the +openbsd libm implementations. +Both of them are based on fdlibm. +The freebsd libm seems to be the most well maintained and most correct version +of fdlibm.

sources:

freebsd /lib/msun (browse) @@ -30,91 +33,108 @@ cvs -d $CVSROOT get src/li

Design issues

The math code is available but there are still many design questions. +

General rules

Code organization, build system tweaks: -
There are architecture specific code, -long double representation specific code -and complex code. -
With respect to long double, bsd libm code can handle -ld64, ld80 and ld128 (where ld64 means long double == double), -for musl it's enough to support ld64 and ld80, but -keeping ld128 shouldn't hurt much either. -
So far musl had arch specific code and common code, -it seems a new category is needed: -long double specific code, which can be ld64, ld80 or ld128. -These should be tied to the architecture in the build system -somehow. In freebsd most of the long double code is -common (using arch specific macros, ifdefs) and only a -small part of the ld code is maintained separately for ld80 and ld128, -but it's also possible to keep all ld code separately (with duplications) -in ld80 and ld128. -
Complex support is optional in c99 so maybe it should -be optional in the musl build as well. It also can be -kept separately in cmath/ or with the rest of the code in math/. -
Pending questions: +
Assumptions about floating-point representation and arithmetics +(see c99 annex F.2):
- Do we want ld128? -
- Should we try to use common code for ld80 and ld128? -
- How to do ld64: wrap double functions or alias them? -
- How to tie the ld* code to the arch in the build system? -
- Make complex optional? -
- Keep complex in math/ or cmath/? -
- -
Workarounds -
The bsd libm code has many workarounds for various -compiler issues. It's not clear what's the best way -to handle the uglyness. -
Pending questions: -
- Use STDC pragmas (eventhough gcc does not support them)? -
- Use volatile consistently to avoid evaluation precision and const folding issues? -
- Use special compiler flags against unwanted optimization (-frounding-math, -ffloat-store)? -
- Do inline/macro optimization for small functions? (isnan, isinf, signbit, creal, cimag,..) -
- In complex code prefer creal(), cimag() or a union to (un)pack re,im? -
- -
Code cleanups -
The fdlibm code style is inconsistent with musl (and ugly) -so it makes sense to clean it up, but keeping easy diffability -can be useful. There are various inconsistencies and -correctness issues in the code which might worth addressing -(eg. reliance on signed overflow should be eliminated). -
Pending questions: -
- Keep diffability with freebsd/openbsd code or reformat the code for clarity? -
- Keep e_, s_, k_ fdlibm source file name prefixes? -
- Should 0x1p0 float format be preferred over decimal format? -
- Should isnan, signbit,.. be preferred over inplace bithacks? -
- Is unpacking a double into 2 int32_t ok (signed int representation)? -
- Is unpacking the mantissa of ld80 into an int64_t ok? +
- float is ieee binary32 +
- double is ieee binary64 +
- long double is either ieee binary64 or little-endian 80bit extended precision (x87 fpu)
+(other long double representations may be supported in the future, until then +long double math functions will be missing on non-supported platforms) +
On soft-float architectures fenv should not be expected to work according to +the c and ieee standards (ie. rounding modes and exceptions are not supported, +the fenv functions are dummy ones) +
Floating-point exception flags should be properly set in math functions +according to c99 annex F, but without using fenv.h +(eg. overflow flag can be raised by 0x1p900*0x1p900, because this works even +without fenv support) +
Most functions need to be precise only in nearest rounding mode. +
Returned floating-point values should be correctly rounded in most cases, +but the last bit may be wrong: +
```
+    |error| < 1.5 ulp
+
```
+should hold for most functions. +(error is the difference between the exact result and the calculated +floating-point value) +(in theory correct rounding can be achieved but with big implementation cost, +see crlibm) +
At least the following functions must be correctly rounded: +ceil, copysign, fabs, fdim, floor, fma, fmax, fmin, frexp, ldexp, logb, +modf, nearbyint, nextafter, nexttoward, rint, round, scalbln, scalbn, +sqrt, trunc. +
Mathematical properties of functions should be as expected +(monotonicity, range, symmetries). +
If the FPU precision is altered then nothing is guaranteed to work. +(ie. when long double does not have full 80bit precision on i386 then things +may break, this also means that compiler must spill fpu registers at 80bits +precision) +
Signaling NaN is not supported +
Quiet NaN is supported but all NaNs are treated equally without special +attention to the internal representation of a NaN +(eg. the sign of NaN may not be preserved). +
Most gcc bug workarounds should be removed from the code +(STRICT_ASSIGN macro is used when excessive precision is harmful and +FORCE_EVAL when expressions must be evaluated for their side-effect, other +usage of volatile is not justified, hacks around long double constants are +not justified eventhough gcc can miscompile those with non-default FPU setting) +
When excessive precision is not harmful, temporary variables +should be float_t or double_t (so on i386 no superfluous store is +generated) +
Whenever fenv is accessed the FENV_ACCESS pragma of c99 should be used +(eventhough gcc does not yet support it), and all usage of optional FE_ +macros should be protected by #ifdef +
For bit manipulation of floating-point values an union should be used +(eg. union {float f; uint32_t i;}) +
uint32_t and uint64_t should be used for bit manipulations. +(eg signed int must not be used in bit shifts etc when it might invoke +undefined or implementation defined behaviour). +
POSIX namespace rules must be respected. +
c99 hexfloat syntax (0x1.0p0) should be used when it makes the code +clearer, but not in public header files +(those should be c++ and ansi c compatible) +
The 'f' suffix should be used for single precision values (0.1f) when the +value cannot be exactly represented or the type of the arithmetics is important +((float)0.1 is not ok, that style may lead to double rounding issues, but eg. +1.0 or 0.5 may be used instead of 1.0f or 0.5f in some cases) +
Prefer classification macros (eg. isnan) over inplace bit hacks. +
For every math function there should be a c implementation. +(a notable exception now is sqrtl, since most fpu has instruction for it +and on soft-float architectures long double == double) +
The c implementation of a long double function should use ifdefs with the +LDBL_MANT_DIG etc constants from float.h for architecture specific +implementations. +
In the musl source tree math.h functions go to src/math, complex.h functions +to src/complex and fenv.h functions to src/fenv. And files are named after the +functions they implement.

Representation

Binary representation of floating point numbers matter -because lots of bithacks are needed in the math code. +because bit hacks are often needed in the math code. +(in particular bit hacks are used instead of relational operations for nan +and sign checks becuase relational operators raise invalid fp exception on nan +and they treat -0.0 and +0.0 equally and more often than not these are not desired)

-float and double bit manipulation can be handled -in a portable way in c: +float and double bit manipulation can be handled in a portable way in c using +union types:

float is ieee binary32 -
double is ieee binary64 +
union {float f; uint32_t i;}; +
union {double f; uint64_t i;};

-(and old non-standard representations are not supported) -

-The endianness may still vary, but that can be worked -around by using a union with a single large enough -unsigned int. (The only exception is probably arm/oabi fpa -where the word order of a double is not consistent with the -endianness [debian wiki on arm], -but we probably won't support that) -

-long double bit manipulation is harder as there are -various representations: +(assuming the bits in the object representation of 32bit and 64bit unsigned ints +map to the floating-point representation according to ieee-754, this is not +always the case, eg. old +arm floating-point accelerator +(FPA) used mixed endian double representation, but musl does not support the old +arm ABI) +

+long double bit manipulation is harder as there are various representations +and some of them don't map to any unsigned integer type:

ld64: long double is the same as double (ieee binary64)
ld80: 80bit extended precision format [wikipedia] @@ -122,31 +142,50 @@ various representations:

(and other non-standard formats are not supported)

-ld64 is easy to handle: all long double functions -are aliased to the corresponding double one -(long double is treated equivalently to double) +In case of ld64 the bit manipulation is the same as with double +and all long double math functions can be just wrappers around the +corresponding double ones. +(using symbol aliasing on the linker level is non-conformant +since functions would not have unique address then)

-ld80 is the most common (i386, x86_64), it means -64bit significand with explicit msb (inconsistent with other ieee formats), -15bit exp, 1 sign bit. +ld80 is the most common long double on linux (i386 and x86_64 abi), +it means 64bit significand with explicit msb +(inconsistent with other ieee formats), 15bit exp, 1 sign bit. +The m68k (and m88k) architecture uses the same format, but different endianness: +

union {long double f; struct{uint64_t m; uint16_t se; uint16_t pad;} i;}; // x86 +
union {long double f; struct{uint16_t se; uint16_t pad; uint64_t m;} i;}; // m68k +

+where m is the significand and se is the sign and exponent.

-ld128 is rare (sparc64 with software emulation), it means -113bit significand with implicit msb, 15bit exp, 1 sign bit. +ld128 is rare (eg. sparc64 with software emulation), it means +113bit significand with implicit msb, 15bit exp, 1 sign bit: +

union {long double f; struct{uint16_t se; uint16_t hi; uint32_t mid; uint64_t lo;} i;}; +

-Endianness can vary (although on the supported i386 and x86_64 it is the same) -and there is no large enough unsigned int to handle it. -(In case of ld80 internal padding can vary as well, eg -m68k and m88k cpus use different ld80 than the intel ones) - +There are other non-conformant long double types: eg. the old SVR4 abi for ppc +uses 128 bit long doubles, but it's software emulated and traditionally +implemented using +two doubles +(also called ibm long double as this is what ibm aix used on ppc). +The ibm s390 supports the ieee 754-2008 compliant binary128 floating-point +format, but previous ibm machines (S/370, S/360) used slightly different +representation. +

+This variation shows the difficulty to consistently handle +long double: the solution is to use ifdefs based on float.h and +on the endianness and write different code for different architectures.

Ugly

The ugly parts of libm hacking.

Some notes are from: http://www.vinc17.org/research/extended.en.html - +

Useful info about floating-point in gcc: +http://gcc.gnu.org/wiki/FloatingPointMath

Double rounding (x87 issue): +
Double rounding:
If a value rounded twice the result can be different than rounding just once. @@ -158,16 +197,30 @@ the results twice: round to 80bit when calculating and then round to 64bit when storing it, this can give different result than a single 64bit rounding. (on x86-linux the default fpu setting is to round the -results in extended precision, this only affects x87 instructions, not see2 etc) -(afaik freebsd and openbsd use double precision by default) +results in extended precision, this only affects x87 instructions, not sse2 etc) +(freebsd and openbsd use double precision by default)
So x = a+b may give different results depending on -the fpu setting. +the x87 fpu precision setting. (only happens in round to nearest rounding mode, but that's the most common one)
-C99 annex F prohibits double rounding, -but that's non-normative. +(double rounding can happen with float vs double as well) +
+C99 annex F +prohibits double rounding, but that's non-normative. +
+Note that the value of the result can only be ruined by +double rounding in nearest rounding mode, but the double +rounding issue haunts directed rounding modes as well: +raising the underflow flag might be omitted. +On x86 with downward rounding +
```
+(double)(0x1p-1070 + 0x1p-2000L)
+
```
+does not raise underflow (only inexact) eventhough the +final result is an inexact subnormal. +
Wider exponent range (x87 issue):
@@ -188,12 +241,22 @@ C does not require consistent evaluation precision: the compiler may store intermediate results and round them to double while keep other parts in higher precision. -So (a+b)==(a+b) may be false when the two sides +(And the precision the compiler choses can +be inconsistent: adding a printf to the code +may change the result of a nearby calculation). +So +
```
+(a+b)==(a+b)
+
```
+may be false when the two sides are kept in different precision. (This is not an x87 specific problem, it matters whenever there -is a higher precision fp type than the currently used one. +is a higher precision fp type than the currently used one and +FLT_EVAL_METHOD!=0. It goes away if the highest precision (long double) is used everywhere, but that can have a huge penalty). +(clang uses sse by default on i386 with FLT_EVAL_METHOD==0, +while gcc uses the 80bit x87 fp registers and FLT_EVAL_METHOD==2)
C99 has a way to control this (see 5.1.2.3 example 4, @@ -202,15 +265,17 @@ C99 has a way to control this (see when a result is stored in a variable or a type cast is used, then it is guaranteed that the precision is appropriate to that type. -So in (double)(a+b)==(double)(a+b) both -sides are guaranteed to be in double precision +So in +
```
+(double)(a+b)==(double)(a+b)
+
```
+both sides are guaranteed to be in double precision when the comparision is done.
(This still does not solve the x87 double rounding issue though: eg if the left side is evaluated with sse2 and the right side with x87 extended precision setting -and double rounding then the result may be false -and still conformant to C99) +and double rounding then the result may still be false)
Unfortunately gcc does not respect the standard and even if assingment or cast is used the result @@ -220,13 +285,18 @@ also see gcc bug3657 gcc 4.5 fixed it with '-fexcess-precision=standard' (it is enabled by '-std=c99', but the default is '-fexcess-precision=fast') +(An alternative solution would be if gcc spilled the +registers with temporary results without rounding, +storing the 80 bit registers entirely in memory +which would make the behaviour under FLT_EVAL_METHOD==2 +mode more predictable)
The workaround for older gcc is to force the compiler to store the intermediate results: by using volatile double temporary variables -or by '-ffloat-store' (slow, and of course -all intermediate results should be assigned -to some variable, casting is not enough). +or by '-ffloat-store' (which affects all +intermediate results, but is not guaranteed +by the gcc docs to always work).
(Sometimes the excess precision is good but it's hard to rely on it as it is optional @@ -238,6 +308,40 @@ use higher precision variables when that's what we want and don't depend on the implicit excess precision). +
Float literals +
+The standard allows 1 ulp errors in the conversion +of decimal floating-point literals into floating-point +values (it only requires the same result for the same +literal thus 1.1 - 1.1 is always 0, +but 1.1 - 11e-1 maybe +-0x1p-52 or 0). +
+A reasonable compiler always use correctly rounded +conversion according to the default (nearest) rounding +mode, but there are exceptions: +the x87 has builtin constants which are faster to load +from hardware than from memory +(and the hw has sticky bit set correctly for rounding). +gcc can recognize these constants so an operation on +
```
+3.141592653589793238462643383L
+
```
+can turn into code that uses the fldpi +instruction instead of memory loads. +The only issue is that fldpi depends on +the current rounding mode at runtime +so the result can indeed be 1 ulp off compared +to the compile-time rounded value. +
+According to the freebsd libm code gcc truncates long double +const literals on i386. +I assume this happens because freebsd uses 64bit long doubles by default +(double precision) and gcc incorrectly uses the precision setting of the +host platform instead of the target one, but i did not observe this on linux. +(as a workaround sometimes double-double arithmetics was used +to initialize long doubles on i386, but most of these should be +fixed in musl's math code now) +
Compiler optimizations:
Runtime and compile time semantics may be different @@ -252,13 +356,15 @@ different precision than at runtime). C99 actually allows most of these optimizations but they can be turned off with STDC pragmas (see 6.10.6). -Unfortunately gcc does not support these pragmas. +Unfortunately gcc does not support these pragmas +nor clang (clang bug 8100).
FENV_ACCESS ON tells the compiler that the code wants to access the floating point environment (eg. set different rounding mode) (see gcc bug34678).
-(see gcc bug37845 for FP_CONTRACT pragma). +(see gcc bug37845 for FP_CONTRACT pragma +which is relevant when the architecture has fma instruction, x86_64, i386, arm do not have it).
The workaround is again using named volatile variables for constants like @@ -266,45 +372,114 @@ variables for constants like static const volatile two52 = 0x1p52; and using the '-frounding-math' gcc flag. -
-(According the freebsd libm code gcc truncates -long double const literals on i386. -I haven't yet verified if this still the case, -but as a workaround sometimes double-double arithmetics is used: -initializing the long double constant from two doubles)
ld80 vs ld128
The two representations are sufficiently different that treating them together is awkward. +(Especially the explicit msb bit in ld80 can cause +different behaviour).
In the freebsd libm code a few architecture specific macros and a union handle these issues, but the result is often less clear than treating ld80 and ld128 separately. -
Signed int -
-The freebsd libm code has many inconsistencies -(naming conventions, 0x1p0 notation vs decimal notation,..), -one of them is the integer type used for bitmanipulations: -The bits of a double are unpacked into one of -int32_t, uint32_t and u_int32_t -integer types. +
Signed zeros +
Signed zeros can be tricky. They cannot be checked +using the usual comparision operators (+0.0 == -0.0 is true), +but they give different results in some cases +(1/-0.0 == -inf) and can make similarly looking +expressions different eg. +(x + 0) is not the same as x, the former is +0.0 when x is -0.0 +
+(To check for -0, the signbit macro can be used +or the copysign function or bit manipulation.) + +
Error handling
-int32_t is used most often which is wrong because of -implementation defined signed int representation. +Arithmetics may set various floating point exception flags as a side effect. +These can be queried and manipulated (fetestexcept, feraiseexcept,..).
-In general signed int is not handled carefully -in the libm code: scalbn even depends on signed int overflow. +So special care is needed +when a library function wants to avoid changing +the floating point status flags. +eg. if one wants to check for -0 silently then +
```
+if (x == 0.0 && 1/x < 0) { /* x is a -0 */ }
+
```
+is not ok: == raises invalid exception when x is nan, +and even if we filter nans out 1/x will raise the +divbyzero flag. +
+When a library wants to raise a flag deliberately +but feraiseexcept is not available for some reason, +then simple arithmetics can be be used just for their +exception raising side effect +(eg. 1/0.0 to raise divbyzero), however beaware +of compiler optimizations (constant folding and dead code elimination,..). +
+Unfortunately gcc does not always take fp exceptions into +account: a simple x = 1e300*1e300; may not raise overflow +exception at runtime, but get optimized into x = +inf. +see compiler optimizations above. +
+Another x87 gcc bug related to fp exceptions is that in some cases +comparision operators (==, <, etc) don't raise invalid +when an operand is nan +(eventhough this is required by ieee + c99 annex F). +(see gcc bug52451). +
+The ieee standard defines signaling and quiet nan +floating-point numbers as well. +The c99 standard only considers quiet nan, but it allows +signaling nans to be supported as well. +Without signaling nans x * 1 is equivalent to x, +but if signaling nan is supported then the former +raises an invalid exception. +This may complicate things further if one wants to write +portable fp math code. +
+A further libm design issue is the math_errhandling macro: +it specifies the way math function errors can be checked +which is either fp exceptions or the errno variable or both. +The fp exception approach can be supported with enough care +but errno is hard to support: certain library functions +are implemented as a single asm instruction (eg sqrt), +the only way to set errno is to query the fp exception flags +and then set the errno variable based on that. +So eventhough errno may be convenient, in libm it is +not the right thing to do. +
+For soft-float targets however errno seems to be the only option +(which means annex K cannot be fully supported, as it requires +the support of exception flags). +The problem is that at context switches the fpu status should +be saved and restored which is done by the kernel on hard-fp +architectures when the state is in an fpu status word. +In case of soft-fp emulation this must be done by the c runtime: +context switches between threads can be supported with thread local +storage of the exception state, but signal handlers may do floating-point +arithmetics which should not alter the fenv state. +Wrapping signal handlers is not possible/difficult for various +reasons and the compiler cannot know which functions will be used +as signal handlers, so the c runtime has no way to guarantee that +signal handlers do not alter the fenv.
Complex arithmetics
-With gcc x*I turns into (x+0*I)*(0+I) = 0*x + x*I, -so if x=inf then the result is nan+inf*I instead of inf*I +gcc turns +
```
+x*I
+
```
+into +
```
+(x+0*I)*(0+I) = 0*x + x*I
+
```
+So if x=inf then the result is nan+inf*I instead of inf*I (an fmul instruction is generated for 0*x)
-(There were various complex constant folding issues as well). +(There were various complex constant folding issues as well in gcc).
So a+b*I cannot be used to create complex numbers, instead some double[2] manipulation should be used to @@ -323,6 +498,31 @@ It's not clear how I (or _Complex_I) should be defined in complex.h (literal of the float imaginary unit is compiler specific, in gcc it can be 1.0fi). + +
Signed int +
+The freebsd libm code has many inconsistencies +(naming conventions, 0x1p0 notation vs decimal notation,..), +one of them is the integer type used for bit manipulations: +The bits of a double are unpacked into one of +int, int32_t, uint32_t and u_int32_t +integer types. +
+int32_t is used the most often which is not wrong in itself +but it is used incorrectly in many places. +
+int is a bit worse because unlike int32_t it is not guaranteed +to be 32bit two's complement representation. (but of course in +practice they are the same) +
+The issues found so far are left shift of negative integers +(undefined behaviour), right shift of negative integers +(implementation defined behaviour), signed overflow +(implementation defined behaviour), unsigned to signed conversion +(implementation defined behaviour). +
+It is easy to avoid these issues without performance impact, +but a bit of care should be taken around bit manipulations.

libm implementations

@@ -378,6 +578,7 @@ various other closed ones: intel libm, hp libm for itanium,.. mpcheck

FPAccuracy

beebe's ieee tests +

elefunt

testlibm

fptest

tests in crlibm