X-Git-Url: http://nsz.repo.hu/git/?a=blobdiff_plain;f=libm%2Findex.html;h=b008b1d1ca171ae310af18c3d6208d95cf575dec;hb=343739050aa76ec798751844da15b1805e28feeb;hp=583255495c3af4a240176b7ae6a5c93753449dc9;hpb=2717c24f6ff5655e0e9fe561e690e0044e1a3bed;p=www diff --git a/libm/index.html b/libm/index.html index 5832554..b008b1d 100644 --- a/libm/index.html +++ b/libm/index.html @@ -7,6 +7,7 @@ so already existing code is used.
+git clone git://nsz.repo.hu:45100/repo/libm ++
+git clone http://nsz.repo.hu/repo/libm ++
+git clone ssh://anon@nsz.repo.hu:45022/repo/libm ++
The math code is mostly from freebsd which in turn is based on fdlibm. @@ -31,36 +49,17 @@ cvs -d $CVSROOT get src/li
The math code is available but there are still many design questions. +
There are architecture specific code, -long double representation specific code -and complex code. -
With respect to long double, bsd libm code can handle -ld64, ld80 and ld128 (where ld64 means long double == double), -for musl it's enough to support ld64 and ld80, but -keeping ld128 shouldn't hurt much either. -
So far musl had arch specific code and common code, -it seems a new category is needed: -long double specific code, which can be ld64, ld80 or ld128. -These should be tied to the architecture in the build system -somehow. In freebsd most of the long double code is -common (using arch specific macros, ifdefs) and only a -small part of the ld code is maintained separately for ld80 and ld128, -but it's also possible to keep all ld code separately (with duplications) -in ld80 and ld128. -
Complex support is optional in c99 so maybe it should -be optional in the musl build as well. It also can be -kept separately in cmath/ or with the rest of the code in math/. +
The long double code uses ifdefs and macro hacks to allow +ld64, ld80 and ld128 implementation in the same file.
Pending questions:
Pending questions:
The fdlibm code style is inconsistent with musl (and ugly) -so it makes sense to clean it up, but keeping easy diffability -can be useful. There are various inconsistencies and -correctness issues in the code which might worth addressing -(eg. reliance on signed overflow should be eliminated). +
The fdlibm code has code style and correctness issues +which might worth addressing.
Pending questions:
The endianness may still vary, but that can be worked around by using a union with a single large enough @@ -123,8 +121,8 @@ various representations: (and other non-standard formats are not supported)
ld64 is easy to handle: all long double functions -are aliased to the corresponding double one -(long double is treated equivalently to double) +are just wrappers around the corresponding double ones +(aliasing is non-conformant)
ld80 is the most common (i386, x86_64), it means 64bit significand with explicit msb (inconsistent with other ieee formats), @@ -136,7 +134,9 @@ ld128 is rare (sparc64 with software emulation), it means Endianness can vary (although on the supported i386 and x86_64 it is the same) and there is no large enough unsigned int to handle it. (In case of ld80 internal padding can vary as well, eg -m68k and m88k cpus use different ld80 than the intel ones) +m68k and m88k cpus use different ld80 than the intel ones.) +So there can be further variations in the binary representation +than just ld80 and ld128.
Some notes are from: http://www.vinc17.org/research/extended.en.html -
If a value rounded twice the result can be different than rounding just once. @@ -159,15 +158,17 @@ and then round to 64bit when storing it, this can give different result than a single 64bit rounding. (on x86-linux the default fpu setting is to round the results in extended precision, this only affects x87 instructions, not see2 etc) -(afaik freebsd and openbsd use double precision by default) +(freebsd and openbsd use double precision by default)
So x = a+b may give different results depending on -the fpu setting. +the x87 fpu precision setting. (only happens in round to nearest rounding mode, but that's the most common one)
-C99 annex F prohibits double rounding, -but that's non-normative. +(double rounding can happen with float vs double as well) +
+C99 annex F +prohibits double rounding, but that's non-normative.
@@ -188,6 +189,9 @@ C does not require consistent evaluation precision: the compiler may store intermediate results and round them to double while keep other parts in higher precision. +(And the precision the compiler choses can +be inconsistent: adding a printf to the code +may change the result of a nearby calculation). So
(a+b)==(a+b) @@ -216,8 +220,7 @@ when the comparision is done. (This still does not solve the x87 double rounding issue though: eg if the left side is evaluated with sse2 and the right side with x87 extended precision setting -and double rounding then the result may be false -and still conformant to C99) +and double rounding then the result may still be false)Unfortunately gcc does not respect the standard and even if assingment or cast is used the result @@ -231,9 +234,9 @@ gcc 4.5 fixed it with '-fexcess-precision=standard' The workaround for older gcc is to force the compiler to store the intermediate results: by using volatile double temporary variables -or by '-ffloat-store' (slow, and of course -all intermediate results should be assigned -to some variable, casting is not enough). +or by '-ffloat-store' (which affects all +intermediate results, but is not guaranteed +by the gcc docs to always work).
(Sometimes the excess precision is good but it's hard to rely on it as it is optional @@ -259,13 +262,14 @@ different precision than at runtime). C99 actually allows most of these optimizations but they can be turned off with STDC pragmas (see 6.10.6). -Unfortunately gcc does not support these pragmas. +Unfortunately gcc does not support these pragmas.
FENV_ACCESS ON tells the compiler that the code wants to access the floating point environment (eg. set different rounding mode) (see gcc bug34678).
-(see gcc bug37845 for FP_CONTRACT pragma). +(see gcc bug37845 for FP_CONTRACT pragma +which is relevant when the architecture has fma instruction, x86_64, i386, arm do not have it).
The workaround is again using named volatile variables for constants like @@ -277,33 +281,86 @@ and using the '-frounding-math' gcc flag. (According the freebsd libm code gcc truncates long double const literals on i386. I haven't yet verified if this still the case, -but as a workaround sometimes double-double arithmetics is used: +but as a workaround double-double arithmetics is used: initializing the long double constant from two doubles)
The two representations are sufficiently different that treating them together is awkward. +(Especially the explicit msb bit in ld80 can cause +different behaviour).
In the freebsd libm code a few architecture specific macros and a union handle these issues, but the result is often less clear than treating ld80 and ld128 separately. -
-The freebsd libm code has many inconsistencies -(naming conventions, 0x1p0 notation vs decimal notation,..), -one of them is the integer type used for bitmanipulations: -The bits of a double are unpacked into one of -int32_t, uint32_t and u_int32_t -integer types. +
Signed zeros can be tricky. They cannot be checked +using the usual comparision operators (+0.0 == -0.0 is true), +but they give different results in some cases +(1/-0.0 == -inf) and can make similarly looking +expressions different eg. +(x + 0) is not the same as x, the former is +0.0 when x is -0.0 +
+(To check for -0, the signbit macro can be used +or the copysign function or bit manipulation.) + +
-int32_t is used most often which is wrong because of -implementation defined signed int representation. +Arithmetics may set various floating point exception flags as a side effect. +These can be queried and manipulated (fetestexcept, feraiseexcept,..).
-In general signed int is not handled carefully -in the libm code: scalbn even depends on signed int overflow. +So special care is needed +when a library function wants to avoid changing +the floating point status flags. +eg. if one wants to check for -0 silently then +
+if (x == 0.0 && 1/x < 0) { /* x is a -0 */ } ++is not ok: == raises invalid exception when x is nan, +and even if we filter nans out 1/x will raise the +divbyzero flag. +
+When a library wants to raise a flag deliberately +but feraiseexcept is not available for some reason, +then simple arithmetics can be be used just for their +exception raising side effect +(eg. 1/0.0 to raise divbyzero), however beaware +of compiler optimizations (dead code elimination,..). +
+Unfortunately gcc does not always take fp exceptions into +account: a simple x = 1e300*1e300; may not raise overflow +exception at runtime, but get optimized into x = +inf. +see compiler optimizations above. +
+Another x87 gcc bug related to fp exceptions is that +comparision operators (==, <, etc) don't raise invalid +when an operand is nan +(eventhough this is required by ieee + c99 annex F). +(see gcc bug52451). +
+The ieee standard defines signaling and quite nan +floating-point numbers as well. +The c99 standard only considers quiet nan, but it allows +signaling nans to be supported as well. +Without signaling nans x * 1 is equivalent to x, +but if signaling nan is supported then the former +raises an invalid exception. +This may complicates things further if one wants to write +portable fp math code. +
+A further libm design issue is the math_errhandling macro: +it specifies the way math function errors can be checked +which is either fp exceptions or the errno variable or both. +The fp exception approach can be supported with enough care +but errno is hard to support: certain library functions +are implemented as a single asm instruction (eg sqrt), +the only way to set errno is to query the fp exception flags +and then set the errno variable based on that. +So eventhough errno may be convenient in libm it is +not the right thing to do.
@@ -337,6 +394,21 @@ It's not clear how I (or _Complex_I) should be defined in complex.h (literal of the float imaginary unit is compiler specific, in gcc it can be 1.0fi). + +
+The freebsd libm code has many inconsistencies +(naming conventions, 0x1p0 notation vs decimal notation,..), +one of them is the integer type used for bitmanipulations: +The bits of a double are unpacked into one of +int32_t, uint32_t and u_int32_t +integer types. +
+int32_t is used most often which is wrong because of +implementation defined signed int representation. +
+In general signed int is not handled carefully +in the libm code: scalbn even depends on signed int overflow.