X-Git-Url: http://nsz.repo.hu/git/?a=blobdiff_plain;f=libm%2Findex.html;h=2e1ccdec7ce3b6869e1991e4670e77a719ae65e4;hb=05b4cec71bfd2f9ff568c04afee9072a203b115e;hp=a9b76cf6bb9e14602536db8e17f3da51ef956605;hpb=e8a1b2016bef1b564b3964e1b2ea5bd0d2ac7f18;p=www diff --git a/libm/index.html b/libm/index.html index a9b76cf..2e1ccde 100644 --- a/libm/index.html +++ b/libm/index.html @@ -35,7 +35,7 @@ cvs -d $CVSROOT get src/li
|error| < 1.5 ulp-should hold. +should hold for most functions. (error is the difference between the exact result and the calculated floating-point value) (in theory correct rounding can be achieved but with big implementation cost, see crlibm)
Binary representation of floating point numbers matter because bit hacks are often needed in the math code. +(in particular bit hacks are used instead of relational operations for nan +and sign checks becuase relational operators raise invalid fp exception on nan +and they treat -0.0 and +0.0 equally and more often than not these are not desired)
float and double bit manipulation can be handled in a portable way in c using union types: @@ -156,9 +161,11 @@ ld128 is rare (eg. sparc64 with software emulation), it means
-There are other non-conformant long double types: eg. ppc abi (both SVR4 and -the newer eabi) uses 128 bit long doubles, but it's software emulated using -(the newer ppc eabi uses ld64). +There are other non-conformant long double types: eg. the old SVR4 abi for ppc +uses 128 bit long doubles, but it's software emulated and traditionally +implemented using +two doubles +(also called ibm long double as this is what ibm aix used on ppc). The ibm s390 supports the ieee 754-2008 compliant binary128 floating-point format, but previous ibm machines (S/370, S/360) used slightly different representation. @@ -306,11 +313,14 @@ static const volatile two52 = 0x1p52; and using the '-frounding-math' gcc flag.
-(According the freebsd libm code gcc truncates -long double const literals on i386. -I haven't yet verified if this still the case, -but as a workaround double-double arithmetics is used: -initializing the long double constant from two doubles) +According to the freebsd libm code gcc truncates long double +const literals on i386. +I assume this happens because freebsd uses 64bit long doubles by default +(double precision) and gcc incorrectly uses the precision setting of the +host platform instead of the target one, but i did not observe this on linux. +(as a workaround sometimes double-double arithmetics was used +to initialize long doubles on i386, but most of these should be +fixed in musl's math code now)
@@ -356,27 +366,27 @@ but feraiseexcept is not available for some reason, then simple arithmetics can be be used just for their exception raising side effect (eg. 1/0.0 to raise divbyzero), however beaware -of compiler optimizations (dead code elimination,..). +of compiler optimizations (constant folding and dead code elimination,..).
Unfortunately gcc does not always take fp exceptions into account: a simple x = 1e300*1e300; may not raise overflow exception at runtime, but get optimized into x = +inf. see compiler optimizations above.
-Another x87 gcc bug related to fp exceptions is that +Another x87 gcc bug related to fp exceptions is that in some cases comparision operators (==, <, etc) don't raise invalid when an operand is nan (eventhough this is required by ieee + c99 annex F). (see gcc bug52451).
-The ieee standard defines signaling and quite nan +The ieee standard defines signaling and quiet nan floating-point numbers as well. The c99 standard only considers quiet nan, but it allows signaling nans to be supported as well. Without signaling nans x * 1 is equivalent to x, but if signaling nan is supported then the former raises an invalid exception. -This may complicates things further if one wants to write +This may complicate things further if one wants to write portable fp math code.
A further libm design issue is the math_errhandling macro: @@ -387,8 +397,23 @@ but errno is hard to support: certain library functions are implemented as a single asm instruction (eg sqrt), the only way to set errno is to query the fp exception flags and then set the errno variable based on that. -So eventhough errno may be convenient in libm it is +So eventhough errno may be convenient, in libm it is not the right thing to do. +
+For soft-float targets however errno seems to be the only option +(which means annex K cannot be fully supported, as it requires +the support of exception flags). +The problem is that at context switches the fpu status should +be saved and restored which is done by the kernel on hard-fp +architectures when the state is in an fpu status word. +In case of soft-fp emulation this must be done by the c runtime: +context switches between threads can be supported with thread local +storage of the exception state, but signal handlers may do floating-point +arithmetics which should not alter the fenv state. +Wrapping signal handlers is not possible/difficult for various +reasons and the compiler cannot know which functions will be used +as signal handlers, so the c runtime has no way to guarantee that +signal handlers do not alter the fenv.
@@ -427,16 +452,26 @@ in gcc it can be 1.0fi).
The freebsd libm code has many inconsistencies (naming conventions, 0x1p0 notation vs decimal notation,..), -one of them is the integer type used for bitmanipulations: +one of them is the integer type used for bit manipulations: The bits of a double are unpacked into one of -int32_t, uint32_t and u_int32_t +int, int32_t, uint32_t and u_int32_t integer types.
-int32_t is used most often which is wrong because of -implementation defined signed int representation. +int32_t is used the most often which is not wrong in itself +but it is used incorrectly in many places. +
+int is a bit worse because unlike int32_t it is not guaranteed +to be 32bit two's complement representation. (but of course in +practice they are the same) +
+The issues found so far are left shift of negative integers +(undefined behaviour), right shift of negative integers +(implementation defined behaviour), signed overflow +(implementation defined behaviour), unsigned to signed conversion +(implementation defined behaviour).
-In general signed int is not handled carefully -in the libm code: scalbn even depends on signed int overflow. +It is easy to avoid these issues without performance impact, +but a bit of care should be taken around bit manipulations.