libm: design and notes for a freebsd based implementation

author nsz <nsz@port70.net>

Sat, 25 Feb 2012 21:40:26 +0000 (22:40 +0100)

committer nsz <nsz@port70.net>

Sat, 25 Feb 2012 21:40:26 +0000 (22:40 +0100)
author nsz <nsz@port70.net>
Sat, 25 Feb 2012 21:40:26 +0000 (22:40 +0100)
committer nsz <nsz@port70.net>
Sat, 25 Feb 2012 21:40:26 +0000 (22:40 +0100)
diff --git a/libm.txt b/libm.txt

index b1bd951..7d2d44d 100644 (file)
--- a/libm.txt
+++ b/libm.txt
@@ -1,3 +1,307 @@
+sources
+-------
+
+fdlibm sources
+(freebsd is most up to date probably, but some code is
+only available from openbsd)
+
+freebsd /lib/msun
+svn checkout svn://svn.freebsd.org/base/head/lib/msun
+
+openbsd /src/lib/libm
+cvs -d anoncvs@anoncvs.eu.openbsd.org:/cvs get src/lib/libm
+
+
+code organization
+-----------------
+
+math code needs some special code organization
+or build system tweaks
+
+* long double:
+(representation issues are below)
+code organization constraints:
+1.) internal .h to bit manipulate float and double
+2.) long double representation specific .h
+3.) long double representation specific .c
+4.) common long double code for all representations
+5.) no long double code when long double == double
+6.) arch specific .s
+
+1. can be put in src/internal
+2. and 3. can be handled like arch specific code
+(separate directories for different long double
+representations and then somehow hook them up
+in the build system)
+4. can be regular source files in math/ but needs
+ifdefs due to 5.
+all double functions need an extra wrapper/alias
+to solve 5.
+6. can be done the usual way
+
+dir tree:
+/include/math.h
+/include/complex.h
+/arch/ld80/ldouble.h - duplicate per arch ?
+/arch/ld128/ldouble.h
+/src/internal/libm.h
+/src/math/ld80/*l.c
+/src/math/ld128/*l.c
+/src/math/arm/*.s
+/src/math/i386/*.s
+/src/math/x86_64/*.s
+/src/math/*.c
+
+in freebsd most of the long double code is common
+to the different representations but there are some
+representation specific files in ld80/ and ld128/
+
+it is possible to have all the long double code
+duplicated and maintained separately in ld80/ and
+ld128/ (or just ld80/ as ld128/ is rarely needed),
+the implementations always have a few differences
+(eventhough most of the logic is the same and the
+constants only differ in precision etc)
+so 4. can be mostly eliminated
+or have no separate ld80/ and ld128/ but use ifdefs
+and macros in the *l.c codes
+os 3. can be eliminated
+
+* complex:
+it makes sense to make complex code optional
+(compiler might not support it)
+this needs some hacks: ifdefed sources or
+some buildsystem hack
+
+* code style and naming:
+fdlibm has many ugly badly indented code
+it makes sense to clean them up a bit if we
+touch them anyway, but it's also good if the
+code can be easily compared to other fdlibms..
+
+fdlibm uses e_ s_ k_ w_ source code prefixes
+(i guess they stand for ieee, standard c, kernel and
+wrapper functions, but i'm not sure, i'd rather drop
+these if fdlibm diffability was not important)
+
+
+representation
+--------------
+
+float and double bit manipulation can be handled
+in a portable way in c
+
+float is ieee binary32
+double is ieee binary64
+(assuming old non-standard representations are not supported)
+
+the endianness may still vary, but that can be worked
+around by using a union with a single large enough
+unsigned int
+
+long double bit manipulation is harder as there are
+various representations:
+1.) same as double: ieee binary64
+2.) ld80: 80bit extended precision format
+3.) ld128: quadruple precision, ieee binary128
+(assuming other non-standard formats are not supported)
+
+case 1. is easy to handle: all long double functions
+are aliased to the corresponding double one
+(long double is treated equivalently to double)
+
+case 2. is the most common (i386, x86_64, emulation)
+case 3. is rare and means software emulation (sparc64)
+
+ld80 means 64bit mant, explicit msb, 15bit exp, 1 sign bit
+ld128 means 113bit mant, implicit msb, 15bit exp, 1 sign bit
+
+endianness can vary (and there is no large enough unsigned
+int to handle it), in case of ld80 padding can vary as well
+(eg the 80bit float of m68k and m88k cpus has different
+padding than the intel one)
+
+supporting ld80 and ld128 with the same code can be done,
+but requres ifdefs and macro hacks, treating them separately
+means many code duplications
+
+
+ugly
+----
+
+some notes are from:
+http://www.vinc17.org/research/extended.en.html
+
+1.) double rounding
+the c language allows arithmetic to be evaluated in
+higher precision and on x86 linux the default fpu
+setting is to round the results in extended precision
+(only x87 instructions are affected, sse2 is not)
+(freebsd and openbsd use double precision by default)
+
+rounding a result in extended precision first then
+rounding it to double when storing it may give
+different results than a single rounding to double
+so x = a+b may give different results depending on
+the fpu setting
+(only happens in round to nearest rounding mode,
+but that's the most common)
+
+2.) wider exponent range
+even if the fpu is set to double precision
+(which is not) the x87 registers use wider exponent
+range so underflows (subnormals) may not be treated
+according to the ieee standard (unless every
+computation result is stored and rounded to double)
+
+3.) precision of intermediate results
+c does not require consistent evaluation
+precision: the compiler may store intermediate
+results and round them to double while keep
+other parts in extended precision
+so (a+b)==(a+b) may be false when the two sides
+are kept in different precision
+
+c99 has a way to control this:
+when a result is stored in a variable or a
+type cast is used, then it is guaranteed that
+the precision is appropriate to that type
+so in (double)(a+b)==(double)(a+b) both
+sides are guaranteed to be in double precision
+when the comparision is done
+(this still does not solve 1. though: eg if
+the left side is evaluated with sse2 and the
+right side with x87 then the result may be false)
+
+unfortunately gcc does not respect the standard
+(infamous gcc 323 bug)
+gcc 4.5 fixed it with '-fexcess-precision=standard'
+(it is enabled by '-std=c99', but the default is
+'-fexcess-precision=fast')
+
+the workaround for older gcc is to force the
+compiler to store the intermediate results:
+by using volatile double temporary variables
+or by '-ffloat-store' (slow)
+
+(sometimes the excess precision is good
+but it's hard to rely on it as it is optional
+
+there is a way to check for it though using
+FLT_EVAL_METHOD, float_t and double_t
+
+but even then it's probably easier to
+unconditionally cast the variables of an
+expression to a higher precision when
+that's what we want and don't depend
+on the implicit excess precision)
+
+4.) runtime vs compile time
+runtime and compile time semantics may be different
+gcc often does unwanted or even invalid compile
+time optimizations
+(constant folding where floating point exception
+flags should be raised at runtime, or the result
+depends on runtime floating point environment,
+or constant expressions are just evaluated with
+different precision than at runtime)
+
+the workaround is again using named volatile
+variables like
+static const volatile two52 = 0x1p52;
+
+(?) old gcc truncated long double const expressions
+on x86, the workaround is to put the constant
+together from doubles at runtime
+
+5.) ld80 vs ld128 bit representations (see above)
+the two representations are sufficiently different
+that treating them together is awkward
+
+ld80 does not have implicit msb and the mantissa is
+best handled by a single uint64_t
+ld128 has implicit msb and the mantissa cannot be
+handled in a single unsigned int
+
+(typical operations:
+get/set exp and sign bit (same for ld80 and ld128)
+check x==0, mant==0 (ld80: explicit bit should be masked)
+check |x| < 1, 0.5 (exp + mant manipulation)
+check if subnormal
+check if integral)
+
+in the freebsd fdlibm code a few architecture specific
+macros and a union handle these issues
+
+6.) signed int
+the fdlibm code has many inconsistencies
+naming conventions, 0x1p0 notation vs decimal notation,
+..and integer type used for bitmanipulation
+
+when the bits of a double is manipulated it is unpacked
+into two 32bit words, which can be
+int32_t, uint32_t, u_int32_t
+
+int32_t is used most often which is wrong because of
+implementation defined signed int representation
+(in general signed int is not handled carefully
+scalbn even depends on signed int overflow)
+
+7.) complex is not implemented in gcc properly
+
+x*I turns into (x+0*I)*(0+I) = 0*x + x*I
+so if x=inf then the result is nan+inf*I instead of inf*I
+(an fmul instruction is generated for 0*x)
+
+so a+b*I cannot be used to create complex numbers
+instead some double[2] manipulation should be used to
+make sure there is no fmul in the generated code
+(freebsd uses a static inline cpack function)
+
+it's not clear which is better: using union hacks
+everywhere in the complex code or creal, cimag, cpack
+
+union dcomplex {
+       double complex z;
+       double a[2];
+};
+// for efficiency these should be macros or inline functions
+#define creal(z) ((double)(z))
+#define cimag(z) ((union dcomplex){(z)}.a[1])
+#define cpack(x,y) ((union dcomplex){.a={(x),(y)}}.z)
+
+eg. various implementations for conj:
+
+// proper implementation, but gcc does not do the right thing
+double complex conj(double complex z) {
+       return creal(z) - cimag(z)*I;
+}
+
+// using cpack
+double complex conj(double complex z) {
+       return cpack(creal(z), -cimag(z));
+}
+
+// explicit work around
+double complex conj(double complex z) {
+       union dcomplex u = {z};
+
+       u.a[1] = -u.a[1];
+       return u.z;
+}
+
+8.) complex code is hard to make portable across compilers
+
+eg it's not clear how I (or _Complex_I) should be defined
+in complex.h
+(literal of the float imaginary unit is compiler specific,
+in gcc it can be 1.0fi)
+
+complex support is not mandatory in c99 conformant
+implementations and even if complex is supported
+there might or might not be separate imaginary type
+
+
  libm implementations
  --------------------
  
@@ -82,3 +386,14 @@ double-double, quad-double arithmetics: http://crd-legacy.lbl.gov/~dhbailey/mpdi
  papers by kahan: http://www.cs.berkeley.edu/~wkahan/
  fp paper by goldberg: http://download.oracle.com/docs/cd/E19957-01/806-3568/ncg_goldberg.html
  math functions in general: http://dlmf.nist.gov/
+
+
+
+
+
+
+
+
+
+
+
author	nsz <nsz@port70.net>
	Sat, 25 Feb 2012 21:40:26 +0000 (22:40 +0100)
committer	nsz <nsz@port70.net>
	Sat, 25 Feb 2012 21:40:26 +0000 (22:40 +0100)