libm notes update (excess precision fixes)

[www] / libm / index.html
diff --git a/libm/index.html b/libm/index.html

index adad680..6eb718c 100644 (file)
--- a/libm/index.html
+++ b/libm/index.html
@@ -81,6 +81,9 @@ attention to the internal representation of a NaN
  FORCE_EVAL when expressions must be evaluated for their side-effect, other
  usage of volatile is not justified, hacks around long double constants are
  not justified eventhough gcc can miscompile those with non-default FPU setting)
  FORCE_EVAL when expressions must be evaluated for their side-effect, other
  usage of volatile is not justified, hacks around long double constants are
  not justified eventhough gcc can miscompile those with non-default FPU setting)
+<li>When excessive precision is not harmful, temporary variables
+should be float_t or double_t (so on i386 no superfluous store is
+generated)
  <li>Whenever fenv is accessed the FENV_ACCESS pragma of c99 should be used
  (eventhough gcc does not yet support it), and all usage of optional FE_
  macros should be protected by #ifdef
  <li>Whenever fenv is accessed the FENV_ACCESS pragma of c99 should be used
  (eventhough gcc does not yet support it), and all usage of optional FE_
  macros should be protected by #ifdef
@@ -194,7 +197,7 @@ the results twice: round to 80bit when calculating
  and then round to 64bit when storing it, this can
  give different result than a single 64bit rounding.
  (on x86-linux the default fpu setting is to round the
  and then round to 64bit when storing it, this can
  give different result than a single 64bit rounding.
  (on x86-linux the default fpu setting is to round the
-results in extended precision, this only affects x87 instructions, not see2 etc)
+results in extended precision, this only affects x87 instructions, not sse2 etc)
  (freebsd and openbsd use double precision by default)
  <p>
  So x = a+b may give different results depending on
  (freebsd and openbsd use double precision by default)
  <p>
  So x = a+b may give different results depending on
@@ -206,6 +209,18 @@ but that's the most common one)
  <p>
  <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#F.7.3">C99 annex F</a>
  prohibits double rounding, but that's non-normative.
  <p>
  <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#F.7.3">C99 annex F</a>
  prohibits double rounding, but that's non-normative.
+<p>
+Note that the value of the result can only be ruined by
+double rounding in nearest rounding mode, but the double
+rounding issue haunts directed rounding modes as well:
+raising the underflow flag might be omitted.
+On x86 with downward rounding
+<pre>
+(double)(0x1p-1070 + 0x1p-2000L)
+</pre>
+does not raise underflow (only inexact) eventhough the
+final result is an inexact subnormal.
+
  
  <li>Wider exponent range (x87 issue):
  <p>
  
  <li>Wider exponent range (x87 issue):
  <p>
@@ -236,9 +251,12 @@ So
  may be false when the two sides
  are kept in different precision.
  (This is not an x87 specific problem, it matters whenever there
  may be false when the two sides
  are kept in different precision.
  (This is not an x87 specific problem, it matters whenever there
-is a higher precision fp type than the currently used one.
+is a higher precision fp type than the currently used one and
+FLT_EVAL_METHOD!=0.
  It goes away if the highest precision (long double) is used
  everywhere, but that can have a huge penalty).
  It goes away if the highest precision (long double) is used
  everywhere, but that can have a huge penalty).
+(clang uses sse by default on i386 with FLT_EVAL_METHOD==0,
+while gcc uses the 80bit x87 fp registers and FLT_EVAL_METHOD==2)
  <p>
  C99 has a way to control this (see
  <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#5.1.2.3">5.1.2.3 example 4</a>,
  <p>
  C99 has a way to control this (see
  <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#5.1.2.3">5.1.2.3 example 4</a>,
@@ -267,6 +285,11 @@ also see <a href="http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36578">gcc bug3657
  gcc 4.5 fixed it with '-fexcess-precision=standard'
  (it is enabled by '-std=c99', but the default is
  '-fexcess-precision=fast')
  gcc 4.5 fixed it with '-fexcess-precision=standard'
  (it is enabled by '-std=c99', but the default is
  '-fexcess-precision=fast')
+(An alternative solution would be if gcc spilled the
+registers with temporary results without rounding,
+storing the 80 bit registers entirely in memory
+which would make the behaviour under FLT_EVAL_METHOD==2
+mode more predictable)
  <p>
  The workaround for older gcc is to force the
  compiler to store the intermediate results:
  <p>
  The workaround for older gcc is to force the
  compiler to store the intermediate results:
@@ -285,6 +308,40 @@ use higher precision variables when that's
  what we want and don't depend on the implicit
  excess precision).
  
  what we want and don't depend on the implicit
  excess precision).
  
+<li>Float literals
+<p>
+The standard allows 1 ulp errors in the conversion
+of decimal floating-point literals into floating-point
+values (it only requires the same result for the same
+literal thus <tt>1.1 - 1.1</tt> is always 0,
+but <tt>1.1 - 11e-1</tt> maybe +-0x1p-52 or 0).
+<p>
+A reasonable compiler always use correctly rounded
+conversion according to the default (nearest) rounding
+mode, but there are exceptions:
+the x87 has builtin constants which are faster to load
+from hardware than from memory
+(and the hw has sticky bit set correctly for rounding).
+gcc can recognize these constants so an operation on
+<pre>
+3.141592653589793238462643383L
+</pre>
+can turn into code that uses the <tt>fldpi</tt>
+instruction instead of memory loads.
+The only issue is that fldpi depends on
+the current rounding mode at runtime
+so the result can indeed be 1 ulp off compared
+to the compile-time rounded value.
+<p>
+According to the freebsd libm code gcc truncates long double
+const literals on i386.
+I assume this happens because freebsd uses 64bit long doubles by default
+(double precision) and gcc incorrectly uses the precision setting of the
+host platform instead of the target one, but i did not observe this on linux.
+(as a workaround sometimes double-double arithmetics was used
+to initialize long doubles on i386, but most of these should be
+fixed in musl's math code now)
+
  <li>Compiler optimizations:
  <p>
  Runtime and compile time semantics may be different
  <li>Compiler optimizations:
  <p>
  Runtime and compile time semantics may be different
@@ -299,7 +356,8 @@ different precision than at runtime).
  C99 actually allows most of these optimizations
  but they can be turned off with STDC pragmas (see
  <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#6.10.6">6.10.6</a>).
  C99 actually allows most of these optimizations
  but they can be turned off with STDC pragmas (see
  <a href="http://repo.or.cz/w/c-standard.git/blob_plain/HEAD:/n1256.html#6.10.6">6.10.6</a>).
-Unfortunately <a href="http://gcc.gnu.org/c99status.html">gcc does not support these pragmas</a>.
+Unfortunately <a href="http://gcc.gnu.org/c99status.html">gcc does not support these pragmas</a>
+nor clang (<a href="http://llvm.org/bugs/show_bug.cgi?id=8100">clang bug 8100</a>).
  <p>
  FENV_ACCESS ON tells the compiler that the code wants
  to access the floating point environment (eg. set different rounding mode)
  <p>
  FENV_ACCESS ON tells the compiler that the code wants
  to access the floating point environment (eg. set different rounding mode)
@@ -314,15 +372,6 @@ variables for constants like
  static const volatile two52 = 0x1p52;
  </pre>
  and using the '-frounding-math' gcc flag.
  static const volatile two52 = 0x1p52;
  </pre>
  and using the '-frounding-math' gcc flag.
-<p>
-According to the freebsd libm code gcc truncates long double
-const literals on i386.
-I assume this happens because freebsd uses 64bit long doubles by default
-(double precision) and gcc incorrectly uses the precision setting of the
-host platform instead of the target one, but i did not observe this on linux.
-(as a workaround sometimes double-double arithmetics was used
-to initialize long doubles on i386, but most of these should be
-fixed in musl's math code now)
  
  <li>ld80 vs ld128
  <p>
  
  <li>ld80 vs ld128
  <p>