From 841a1c131bb74baa18d959a9d5d3da439ac2158f Mon Sep 17 00:00:00 2001 From: nsz Date: Thu, 20 Jun 2013 19:23:01 +0000 Subject: [PATCH] libm notes update (excess precision fixes) --- libm/index.html | 73 +++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 61 insertions(+), 12 deletions(-) diff --git a/libm/index.html b/libm/index.html index adad680..6eb718c 100644 --- a/libm/index.html +++ b/libm/index.html @@ -81,6 +81,9 @@ attention to the internal representation of a NaN FORCE_EVAL when expressions must be evaluated for their side-effect, other usage of volatile is not justified, hacks around long double constants are not justified eventhough gcc can miscompile those with non-default FPU setting) +
  • When excessive precision is not harmful, temporary variables +should be float_t or double_t (so on i386 no superfluous store is +generated)
  • Whenever fenv is accessed the FENV_ACCESS pragma of c99 should be used (eventhough gcc does not yet support it), and all usage of optional FE_ macros should be protected by #ifdef @@ -194,7 +197,7 @@ the results twice: round to 80bit when calculating and then round to 64bit when storing it, this can give different result than a single 64bit rounding. (on x86-linux the default fpu setting is to round the -results in extended precision, this only affects x87 instructions, not see2 etc) +results in extended precision, this only affects x87 instructions, not sse2 etc) (freebsd and openbsd use double precision by default)

    So x = a+b may give different results depending on @@ -206,6 +209,18 @@ but that's the most common one)

    C99 annex F prohibits double rounding, but that's non-normative. +

    +Note that the value of the result can only be ruined by +double rounding in nearest rounding mode, but the double +rounding issue haunts directed rounding modes as well: +raising the underflow flag might be omitted. +On x86 with downward rounding +

    +(double)(0x1p-1070 + 0x1p-2000L)
    +
    +does not raise underflow (only inexact) eventhough the +final result is an inexact subnormal. +
  • Wider exponent range (x87 issue):

    @@ -236,9 +251,12 @@ So may be false when the two sides are kept in different precision. (This is not an x87 specific problem, it matters whenever there -is a higher precision fp type than the currently used one. +is a higher precision fp type than the currently used one and +FLT_EVAL_METHOD!=0. It goes away if the highest precision (long double) is used everywhere, but that can have a huge penalty). +(clang uses sse by default on i386 with FLT_EVAL_METHOD==0, +while gcc uses the 80bit x87 fp registers and FLT_EVAL_METHOD==2)

    C99 has a way to control this (see 5.1.2.3 example 4, @@ -267,6 +285,11 @@ also see gcc bug3657 gcc 4.5 fixed it with '-fexcess-precision=standard' (it is enabled by '-std=c99', but the default is '-fexcess-precision=fast') +(An alternative solution would be if gcc spilled the +registers with temporary results without rounding, +storing the 80 bit registers entirely in memory +which would make the behaviour under FLT_EVAL_METHOD==2 +mode more predictable)

    The workaround for older gcc is to force the compiler to store the intermediate results: @@ -285,6 +308,40 @@ use higher precision variables when that's what we want and don't depend on the implicit excess precision). +

  • Float literals +

    +The standard allows 1 ulp errors in the conversion +of decimal floating-point literals into floating-point +values (it only requires the same result for the same +literal thus 1.1 - 1.1 is always 0, +but 1.1 - 11e-1 maybe +-0x1p-52 or 0). +

    +A reasonable compiler always use correctly rounded +conversion according to the default (nearest) rounding +mode, but there are exceptions: +the x87 has builtin constants which are faster to load +from hardware than from memory +(and the hw has sticky bit set correctly for rounding). +gcc can recognize these constants so an operation on +

    +3.141592653589793238462643383L
    +
    +can turn into code that uses the fldpi +instruction instead of memory loads. +The only issue is that fldpi depends on +the current rounding mode at runtime +so the result can indeed be 1 ulp off compared +to the compile-time rounded value. +

    +According to the freebsd libm code gcc truncates long double +const literals on i386. +I assume this happens because freebsd uses 64bit long doubles by default +(double precision) and gcc incorrectly uses the precision setting of the +host platform instead of the target one, but i did not observe this on linux. +(as a workaround sometimes double-double arithmetics was used +to initialize long doubles on i386, but most of these should be +fixed in musl's math code now) +

  • Compiler optimizations:

    Runtime and compile time semantics may be different @@ -299,7 +356,8 @@ different precision than at runtime). C99 actually allows most of these optimizations but they can be turned off with STDC pragmas (see 6.10.6). -Unfortunately gcc does not support these pragmas. +Unfortunately gcc does not support these pragmas +nor clang (clang bug 8100).

    FENV_ACCESS ON tells the compiler that the code wants to access the floating point environment (eg. set different rounding mode) @@ -314,15 +372,6 @@ variables for constants like static const volatile two52 = 0x1p52; and using the '-frounding-math' gcc flag. -

    -According to the freebsd libm code gcc truncates long double -const literals on i386. -I assume this happens because freebsd uses 64bit long doubles by default -(double precision) and gcc incorrectly uses the precision setting of the -host platform instead of the target one, but i did not observe this on linux. -(as a workaround sometimes double-double arithmetics was used -to initialize long doubles on i386, but most of these should be -fixed in musl's math code now)

  • ld80 vs ld128

    -- 2.20.1