libm

X-Git-Url: http://nsz.repo.hu/git/?a=blobdiff_plain;f=libm%2Findex.html;h=2e1ccdec7ce3b6869e1991e4670e77a719ae65e4;hb=05b4cec71bfd2f9ff568c04afee9072a203b115e;hp=e65d7f1558435203fe0127ff66066288ea8b0fe8;hpb=5a0b6b66298a9c31e2c7a950d4346a9d8401f991;p=www diff --git a/libm/index.html b/libm/index.html index e65d7f1..2e1ccde 100644 --- a/libm/index.html +++ b/libm/index.html @@ -1,42 +1,26 @@ libm

libm

This page is about designing libm for the -musl libc. -

Writing the math code from scratch is a huge work -so already existing code is used. +

This page is about libm for the +musl libc.

Branch
Sources -
Design issues +
General rules
Representation
Ugly
libm implementations
libm tests

Branch

musl branch for math hacks: -

browse with gitweb -

over git (recommended) -

-git clone git://nsz.repo.hu:45100/repo/musl
-

over http (slower) -

-git clone http://nsz.repo.hu/repo/musl
-

over ssh (passwd is anon) -

-git clone ssh://anon@nsz.repo.hu:45022/repo/musl
-

Sources

The math code is mostly from freebsd which in turn -is based on fdlibm. +

Writing math code from scratch is a huge work so already existing code is +used. Several math functions are taken from the +freebsd libm and a few from the +openbsd libm implementations. +Both of them are based on fdlibm. +The freebsd libm seems to be the most well maintained and most correct version +of fdlibm.

sources:

freebsd /lib/msun (browse) @@ -49,59 +33,105 @@ cvs -d $CVSROOT get src/li

Design issues

- -

Workarounds -

The bsd libm code has many workarounds for various -compiler issues. It's not clear what's the best way -to handle the uglyness. -

In long double code fpu is assumed to be set to extended precision -

Pending questions: +

General rules

Use STDC pragmas (eventhough gcc does not support them)? -
Where should we use volatile (to avoid evaluation precision and const folding issues)? -
What tricks to use for proper exception raising? (huge*huge, etc) -
Should signaling nans be considered? -
Which compiler flags should be used (-frounding-math, -ffloat-store)? -
TODO: inline/macro optimization for small functions (isnan, isinf, signbit, creal, cimag,..) -

- -

Code cleanups -

The fdlibm code has code style and correctness issues -which might worth addressing. -

Pending questions: +

Assumptions about floating-point representation and arithmetics +(see c99 annex F.2):

Can we use a polyeval(coeffs, x) function instead of writing out the poly? -
TODO: prefer 0x1p0 format over decimal one -
TODO: use 1.0f instead of (float)1 and const float one = 1; -
TODO: use uint32_t instead of int32_t (to avoid signed int overflow and right shifts) -
TODO: prefer isnan, signbit,.. over inplace bithacks +
float is ieee binary32 +
double is ieee binary64 +
long double is either ieee binary64 or little-endian 80bit extended precision (x87 fpu)

+(other long double representations may be supported in the future, until then +long double math functions will be missing on non-supported platforms) +

On soft-float architectures fenv should not be expected to work according to +the c and ieee standards (ie. rounding modes and exceptions are not supported, +the fenv functions are dummy ones) +

Floating-point exception flags should be properly set in math functions +according to c99 annex F, but without using fenv.h +(eg. overflow flag can be raised by 0x1p900*0x1p900, because this works even +without fenv support) +

Most functions need to be precise only in nearest rounding mode. +

Returned floating-point values should be correctly rounded in most cases, +but the last bit may be wrong: +

+    |error| < 1.5 ulp
+

+should hold for most functions. +(error is the difference between the exact result and the calculated +floating-point value) +(in theory correct rounding can be achieved but with big implementation cost, +see crlibm) +

At least the following functions must be correctly rounded: +ceil, copysign, fabs, fdim, floor, fma, fmax, fmin, frexp, ldexp, logb, +modf, nearbyint, nextafter, nexttoward, rint, round, scalbln, scalbn, +sqrt, trunc. +

Mathematical properties of functions should be as expected +(monotonicity, range, symmetries). +

If the FPU precision is altered then nothing is guaranteed to work. +(ie. when long double does not have full 80bit precision on i386 then things +may break, this also means that compiler must spill fpu registers at 80bits +precision) +

Signaling NaN is not supported +

Quiet NaN is supported but all NaNs are treated equally without special +attention to the internal representation of a NaN +(eg. the sign of NaN may not be preserved). +

Most gcc bug workarounds should be removed from the code +(STRICT_ASSIGN macro is used when excessive precision is harmful and +FORCE_EVAL when expressions must be evaluated for their side-effect, other +usage of volatile is not justified, hacks around long double constants are +not justified eventhough gcc can miscompile those with non-default FPU setting) +

Whenever fenv is accessed the FENV_ACCESS pragma of c99 should be used +(eventhough gcc does not yet support it), and all usage of optional FE_ +macros should be protected by #ifdef +

For bit manipulation of floating-point values an union should be used +(eg. union {float f; uint32_t i;}) +

uint32_t and uint64_t should be used for bit manipulations. +(eg signed int must not be used in bit shifts etc when it might invoke +undefined or implementation defined behaviour). +

POSIX namespace rules must be respected. +

c99 hexfloat syntax (0x1.0p0) should be used when it makes the code +clearer, but not in public header files +(those should be c++ and ansi c compatible) +

The 'f' suffix should be used for single precision values (0.1f) when the +value cannot be exactly represented or the type of the arithmetics is important +((float)0.1 is not ok, that style may lead to double rounding issues, but eg. +1.0 or 0.5 may be used instead of 1.0f or 0.5f in some cases) +

Prefer classification macros (eg. isnan) over inplace bit hacks. +

For every math function there should be a c implementation. +(a notable exception now is sqrtl, since most fpu has instruction for it +and on soft-float architectures long double == double) +

The c implementation of a long double function should use ifdefs with the +LDBL_MANT_DIG etc constants from float.h for architecture specific +implementations. +

In the musl source tree math.h functions go to src/math, complex.h functions +to src/complex and fenv.h functions to src/fenv. And files are named after the +functions they implement.

Representation

Binary representation of floating point numbers matter -because lots of bithacks are needed in the math code. +because bit hacks are often needed in the math code. +(in particular bit hacks are used instead of relational operations for nan +and sign checks becuase relational operators raise invalid fp exception on nan +and they treat -0.0 and +0.0 equally and more often than not these are not desired)

-float and double bit manipulation can be handled -in a portable way in c: +float and double bit manipulation can be handled in a portable way in c using +union types:

float is ieee binary32 -
double is ieee binary64 +
union {float f; uint32_t i;}; +
union {double f; uint64_t i;};

-(C99 annex F -makes these mandatory) -

-The endianness may still vary, but that can be worked -around by using a union with a single large enough -unsigned int. (The only exception is probably arm/oabi fpa -where the word order of a double is not consistent with the -endianness [debian wiki on arm], -but we probably won't support that) -

-long double bit manipulation is harder as there are -various representations: +(assuming the bits in the object representation of 32bit and 64bit unsigned ints +map to the floating-point representation according to ieee-754, this is not +always the case, eg. old +arm floating-point accelerator +(FPA) used mixed endian double representation, but musl does not support the old +arm ABI) +

+long double bit manipulation is harder as there are various representations +and some of them don't map to any unsigned integer type:

ld64: long double is the same as double (ieee binary64)
ld80: 80bit extended precision format [wikipedia] @@ -109,24 +139,40 @@ various representations:

(and other non-standard formats are not supported)

-ld64 is easy to handle: all long double functions -are just wrappers around the corresponding double ones -(aliasing is non-conformant) +In case of ld64 the bit manipulation is the same as with double +and all long double math functions can be just wrappers around the +corresponding double ones. +(using symbol aliasing on the linker level is non-conformant +since functions would not have unique address then)

-ld80 is the most common (i386, x86_64), it means -64bit significand with explicit msb (inconsistent with other ieee formats), -15bit exp, 1 sign bit. +ld80 is the most common long double on linux (i386 and x86_64 abi), +it means 64bit significand with explicit msb +(inconsistent with other ieee formats), 15bit exp, 1 sign bit. +The m68k (and m88k) architecture uses the same format, but different endianness: +

union {long double f; struct{uint64_t m; uint16_t se; uint16_t pad;} i;}; // x86 +
union {long double f; struct{uint16_t se; uint16_t pad; uint64_t m;} i;}; // m68k +

+where m is the significand and se is the sign and exponent.

-ld128 is rare (sparc64 with software emulation), it means -113bit significand with implicit msb, 15bit exp, 1 sign bit. +ld128 is rare (eg. sparc64 with software emulation), it means +113bit significand with implicit msb, 15bit exp, 1 sign bit: +

union {long double f; struct{uint16_t se; uint16_t hi; uint32_t mid; uint64_t lo;} i;}; +

-Endianness can vary (although on the supported i386 and x86_64 it is the same) -and there is no large enough unsigned int to handle it. -(In case of ld80 internal padding can vary as well, eg -m68k and m88k cpus use different ld80 than the intel ones.) -So there can be further variations in the binary representation -than just ld80 and ld128. - +There are other non-conformant long double types: eg. the old SVR4 abi for ppc +uses 128 bit long doubles, but it's software emulated and traditionally +implemented using +two doubles +(also called ibm long double as this is what ibm aix used on ppc). +The ibm s390 supports the ieee 754-2008 compliant binary128 floating-point +format, but previous ibm machines (S/370, S/360) used slightly different +representation. +

+This variation shows the difficulty to consistently handle +long double: the solution is to use ifdefs based on float.h and +on the endianness and write different code for different architectures.

Ugly

The ugly parts of libm hacking. @@ -267,11 +313,14 @@ static const volatile two52 = 0x1p52; and using the '-frounding-math' gcc flag.

-(According the freebsd libm code gcc truncates -long double const literals on i386. -I haven't yet verified if this still the case, -but as a workaround double-double arithmetics is used: -initializing the long double constant from two doubles) +According to the freebsd libm code gcc truncates long double +const literals on i386. +I assume this happens because freebsd uses 64bit long doubles by default +(double precision) and gcc incorrectly uses the precision setting of the +host platform instead of the target one, but i did not observe this on linux. +(as a workaround sometimes double-double arithmetics was used +to initialize long doubles on i386, but most of these should be +fixed in musl's math code now)

ld80 vs ld128

@@ -317,27 +366,27 @@ but feraiseexcept is not available for some reason, then simple arithmetics can be be used just for their exception raising side effect (eg. 1/0.0 to raise divbyzero), however beaware -of compiler optimizations (dead code elimination,..). +of compiler optimizations (constant folding and dead code elimination,..).

Unfortunately gcc does not always take fp exceptions into account: a simple x = 1e300*1e300; may not raise overflow exception at runtime, but get optimized into x = +inf. see compiler optimizations above.

-Another x87 gcc bug related to fp exceptions is that +Another x87 gcc bug related to fp exceptions is that in some cases comparision operators (==, <, etc) don't raise invalid when an operand is nan (eventhough this is required by ieee + c99 annex F). (see gcc bug52451).

-The ieee standard defines signaling and quite nan +The ieee standard defines signaling and quiet nan floating-point numbers as well. The c99 standard only considers quiet nan, but it allows signaling nans to be supported as well. Without signaling nans x * 1 is equivalent to x, but if signaling nan is supported then the former raises an invalid exception. -This may complicates things further if one wants to write +This may complicate things further if one wants to write portable fp math code.

A further libm design issue is the math_errhandling macro: @@ -348,8 +397,23 @@ but errno is hard to support: certain library functions are implemented as a single asm instruction (eg sqrt), the only way to set errno is to query the fp exception flags and then set the errno variable based on that. -So eventhough errno may be convenient in libm it is +So eventhough errno may be convenient, in libm it is not the right thing to do. +

+For soft-float targets however errno seems to be the only option +(which means annex K cannot be fully supported, as it requires +the support of exception flags). +The problem is that at context switches the fpu status should +be saved and restored which is done by the kernel on hard-fp +architectures when the state is in an fpu status word. +In case of soft-fp emulation this must be done by the c runtime: +context switches between threads can be supported with thread local +storage of the exception state, but signal handlers may do floating-point +arithmetics which should not alter the fenv state. +Wrapping signal handlers is not possible/difficult for various +reasons and the compiler cannot know which functions will be used +as signal handlers, so the c runtime has no way to guarantee that +signal handlers do not alter the fenv.

Complex arithmetics

@@ -388,16 +452,26 @@ in gcc it can be 1.0fi).

The freebsd libm code has many inconsistencies (naming conventions, 0x1p0 notation vs decimal notation,..), -one of them is the integer type used for bitmanipulations: +one of them is the integer type used for bit manipulations: The bits of a double are unpacked into one of -int32_t, uint32_t and u_int32_t +int, int32_t, uint32_t and u_int32_t integer types.

-int32_t is used most often which is wrong because of -implementation defined signed int representation. +int32_t is used the most often which is not wrong in itself +but it is used incorrectly in many places. +

+int is a bit worse because unlike int32_t it is not guaranteed +to be 32bit two's complement representation. (but of course in +practice they are the same) +

+The issues found so far are left shift of negative integers +(undefined behaviour), right shift of negative integers +(implementation defined behaviour), signed overflow +(implementation defined behaviour), unsigned to signed conversion +(implementation defined behaviour).

-In general signed int is not handled carefully -in the libm code: scalbn even depends on signed int overflow. +It is easy to avoid these issues without performance impact, +but a bit of care should be taken around bit manipulations.