3 John Walker's Floating Point Benchmark, derived from...
5 Marinchip Interactive Lens Design System
7 John Walker December 1980
10 http://www.fourmilab.ch/
12 This program may be used, distributed, and modified freely as
13 long as the origin information is preserved.
15 This is a complete optical design raytracing algorithm,
16 stripped of its user interface and recast into portable C. It
17 not only determines execution speed on an extremely floating
18 point (including trig function) intensive real-world
19 application, it checks accuracy on an algorithm that is
20 exquisitely sensitive to errors. The performance of this
21 program is typically far more sensitive to changes in the
22 efficiency of the trigonometric library routines than the
23 average floating point program.
25 The benchmark may be compiled in two modes. If the symbol
26 INTRIG is defined, built-in trigonometric and square root
27 routines will be used for all calculations. Timings made with
28 INTRIG defined reflect the machine's basic floating point
29 performance for the arithmetic operators. If INTRIG is not
30 defined, the system library <math.h> functions are used.
31 Results with INTRIG not defined reflect the system's library
32 performance and/or floating point hardware support for trig
33 functions and square root. Results with INTRIG defined are a
34 good guide to general floating point performance, while
35 results with INTRIG undefined indicate the performance of an
36 application which is math function intensive.
38 Special note regarding errors in accuracy: this program has
39 generated numbers identical to the last digit it formats and
40 checks on the following machines, floating point
41 architectures, and languages:
43 Marinchip 9900 QBASIC IBM 370 double-precision (REAL * 8) format
45 IBM PC / XT / AT Lattice C IEEE 64 bit, 80 bit temporaries
46 High C same, in line 80x87 code
47 BASICA "Double precision"
48 Quick BASIC IEEE double precision, software routines
50 Sun 3 C IEEE 64 bit, 80 bit temporaries,
51 in-line 68881 code, in-line FPA code.
53 MicroVAX II C Vax "G" format floating point
55 Macintosh Plus MPW C SANE floating point, IEEE 64 bit format
58 Inaccuracies reported by this program should be taken VERY
59 SERIOUSLY INDEED, as the program has been demonstrated to be
60 invariant under changes in floating point format, as long as
61 the format is a recognised double precision format. If you
62 encounter errors, please remember that they are just as likely
63 to be in the floating point editing library or the
64 trigonometric libraries as in the low level operator code.
66 The benchmark assumes that results are basically reliable, and
67 only tests the last result computed against the reference. If
68 you're running on a suspect system you can compile this
69 program with ACCURACY defined. This will generate a version
70 which executes as an infinite loop, performing the ray trace
71 and checking the results on every pass. All incorrect results
74 Representative timings are given below. All have been
75 normalised as if run for 1000 iterations.
77 Time in seconds Computer, Compiler, and notes
80 3466.00 4031.00 Commodore 128, 2 Mhz 8510 with software floating
81 point. Abacus Software/Data-Becker Super-C 128,
82 version 3.00, run in fast (2 Mhz) mode. Note:
83 the results generated by this system differed
84 from the reference results in the 8th to 10th
87 3290.00 IBM PC/AT 6 Mhz, Microsoft/IBM BASICA version A3.00.
88 Run with the "/d" switch, software floating point.
90 2131.50 IBM PC/AT 6 Mhz, Lattice C version 2.14, small model.
91 This version of Lattice compiles subroutine
92 calls which either do software floating point
93 or use the 80x87. The machine on which I ran
94 this had an 80287, but the results were so bad
95 I wonder if it was being used.
97 1598.00 Macintosh Plus, MPW C, SANE Software floating point.
99 1582.13 Marinchip 9900 2 Mhz, QBASIC compiler with software
100 floating point. This was a QBASIC version of the
101 program which contained the identical algorithm.
103 404.00 IBM PC/AT 6 Mhz, Microsoft QuickBASIC version 2.0.
104 Software floating point.
106 165.15 IBM PC/AT 6 Mhz, Metaware High C version 1.3, small
107 model. This was compiled to call subroutines for
108 floating point, and the machine contained an 80287
109 which was used by the subroutines.
111 143.20 Macintosh II, MPW C, SANE calls. I was unable to
112 determine whether SANE was using the 68881 chip or
115 121.80 Sun 3/160 16 Mhz, Sun C. Compiled with -fsoft switch
116 which executes floating point in software.
118 78.78 110.11 IBM RT PC (Model 6150). IBM AIX 1.0 C compiler
121 75.2 254.0 Microsoft Quick C 1.0, in-line 8087 instructions,
122 compiled with 80286 optimisation on. (Switches
123 were -Ol -FPi87-G2 -AS). Small memory model.
125 69.50 IBM PC/AT 6Mhz, Borland Turbo BASIC 1.0. Compiled
126 in "8087 required" mode to generate in-line
127 code for the math coprocessor.
129 66.96 IBM PC/AT 6Mhz, Microsoft QuickBASIC 4.0. This
130 release of QuickBASIC compiles code for the
131 80287 math coprocessor.
133 66.36 206.35 IBM PC/AT 6Mhz, Metaware High C version 1.3, small
134 model. This was compiled with in-line code for the
135 80287 math coprocessor. Trig functions still call
138 63.07 220.43 IBM PC/AT, 6Mhz, Borland Turbo C, in-line 8087 code,
139 small model, word alignment, no stack checking,
142 17.18 Apollo DN-3000, 12 Mhz 68020 with 68881, compiled
143 with in-line code for the 68881 coprocessor.
144 According to Apollo, the library routines are chosen
145 at runtime based on coprocessor presence. Since the
146 coprocessor was present, the library is supposed to
147 use in-line floating point code.
149 15.55 27.56 VAXstation II GPX. Compiled and executed under
152 15.14 37.93 Macintosh II, Unix system V. Green Hills 68020
153 Unix compiler with in-line code for the 68881
154 coprocessor (-O -ZI switches).
156 12.69 Sun 3/160 16 Mhz, Sun C. Compiled with -fswitch,
157 which calls a subroutine to select the fastest
158 floating point processor. This was using the 68881.
160 11.74 26.73 Compaq Deskpro 386, 16 Mhz 80386 with 16 Mhz 80387.
161 Metaware High C version 1.3, compiled with in-line
162 for the math coprocessor (but not optimised for the
163 80386/80387). Trig functions still call library
166 8.43 30.49 Sun 3/160 16 Mhz, Sun C. Compiled with -f68881,
167 generating in-line MC68881 instructions. Trig
168 functions still call library routines.
170 6.29 25.17 Sun 3/260 25 Mhz, Sun C. Compiled with -f68881,
171 generating in-line MC68881 instructions. Trig
172 functions still call library routines.
174 4.57 Sun 3/260 25 Mhz, Sun FORTRAN 77. Compiled with
175 -O -f68881, generating in-line MC68881 instructions.
176 Trig functions are compiled in-line. This used
177 the FORTRAN 77 version of the program, FBFORT77.F.
179 4.00 14.20 Sun386i/25 Mhz model 250, Sun C compiler.
181 4.00 14.00 Sun386i/25 Mhz model 250, Metaware C.
183 3.10 12.00 Compaq 386/387 25 Mhz running SCO Xenix 2.
184 Compiled with Metaware HighC 386, optimized
187 3.00 12.00 Compaq 386/387 25MHZ optimized for 386/387.
189 2.96 5.17 Sun 4/260, Sparc RISC processor. Sun C,
190 compiled with the -O2 switch for global
193 2.47 COMPAQ 486/25, secondary cache disabled, High C,
194 486/387, inline f.p., small memory model.
196 2.20 3.40 Data General Motorola 88000, 16 Mhz, Gnu C.
198 1.56 COMPAQ 486/25, 128K secondary cache, High C, 486/387,
199 inline f.p., small memory model.
201 0.66 1.50 DEC Pmax, Mips processor.
203 0.63 0.91 Sun SparcStation 2, Sun C (SunOS 4.1.1) with
204 -O4 optimisation and "/usr/lib/libm.il" inline
207 0.60 1.07 Intel 860 RISC processor, 33 Mhz, Greenhills
210 0.40 0.90 Dec 3MAX, MIPS 3000 processor, -O4.
212 0.31 0.90 IBM RS/6000, -O.
214 0.1129 0.2119 Dell Dimension XPS P133c, Pentium 133 MHz,
215 Windows 95, Microsoft Visual C 5.0.
217 0.0883 0.2166 Silicon Graphics Indigo², MIPS R4400,
220 0.0351 0.0561 Dell Dimension XPS R100, Pentium II 400 MHz,
221 Windows 98, Microsoft Visual C 5.0.
223 0.0312 0.0542 Sun Ultra 2, UltraSPARC V9, 300 MHz, Solaris
226 0.00862 0.01074 Dell Inspiron 9100, Pentium 4, 3.4 GHz, gcc -O3.
237 #define cot(x) (1.0 / tan(x))
244 #define max_surfaces 10
246 /* Local variables */
248 static char tbfr[132];
250 static short current_surfaces;
251 static short paraxial;
253 static double clear_aperture;
255 static double aberr_lspher;
256 static double aberr_osc;
257 static double aberr_lchrom;
259 static double max_lspher;
260 static double max_osc;
261 static double max_lchrom;
263 static double radius_of_curvature;
264 static double object_distance;
265 static double ray_height;
266 static double axis_slope_angle;
267 static double from_index;
268 static double to_index;
270 static double spectral_line[9];
271 static double s[max_surfaces][5];
272 static double od_sa[2][2];
274 static char outarr[8][80]; /* Computed output of program goes here */
276 int itercount; /* The iteration counter for the main loop
277 in the program is made global so that
278 the compiler should not be allowed to
279 optimise out the loop over the ray
283 #define ITERATIONS 1000
285 int niter = ITERATIONS; /* Iteration counter */
287 static char *refarr[] = { /* Reference results. These happen to
288 be derived from a run on Microsoft
289 Quick BASIC on the IBM PC/AT. */
291 " Marginal ray 47.09479120920 0.04178472683",
292 " Paraxial ray 47.08372160249 0.04177864821",
293 "Longitudinal spherical aberration: -0.01106960671",
294 " (Maximum permissible): 0.05306749907",
295 "Offense against sine condition (coma): 0.00008954761",
296 " (Maximum permissible): 0.00250000000",
297 "Axial chromatic aberration: 0.00448229032",
298 " (Maximum permissible): 0.05306749907"
301 /* The test case used in this program is the design for a 4 inch
302 achromatic telescope objective used as the example in Wyld's
303 classic work on ray tracing by hand, given in Amateur Telescope
306 static double testcase[4][4] = {
307 {27.05, 1.5137, 63.6, 0.52},
308 {-16.68, 1, 0, 0.138},
309 {-16.68, 1.6164, 36.7, 0.38},
313 /* Internal trig functions (used only if INTRIG is defined). These
314 standard functions may be enabled to obtain timings that reflect
315 the machine's floating point performance rather than the speed of
316 its trig function evaluation. */
320 /* The following definitions should keep you from getting intro trouble
321 with compilers which don't let you redefine intrinsic functions. */
328 #define atan2 I_atan2
331 #define fabs(x) ((x < 0.0) ? -x : x)
333 #define pic 3.1415926535897932
335 /* Commonly used constants */
337 static double pi = pic,
340 fouroverpi = 4.0 / pic,
343 /* Coefficients for ATAN evaluation */
345 static double atanc[] = {
347 0.4636476090008061165,
348 0.7853981633974483094,
349 0.98279372324732906714,
350 1.1071487177940905022,
351 1.1902899496825317322,
352 1.2490457723982544262,
353 1.2924966677897852673,
354 1.3258176636680324644
357 /* aint(x) Return integer part of number. Truncates towards 0 */
364 /* Note that this routine cannot handle the full floating point
365 number range. This function should be in the machine-dependent
366 floating point library! */
369 if ((int)(-0.5) != 0 && l < 0 )
375 /* sin(x) Return sine, x in radians */
383 x = (((sign= (x < 0.0)) != 0) ? -x: x);
386 x -= (aint(x / twopi) * twopi);
399 r = y * (((((((-0.202253129293E-13 * z + 0.69481520350522E-11) * z -
400 0.17572474176170806E-8) * z + 0.313361688917325348E-6) * z -
401 0.365762041821464001E-4) * z + 0.249039457019271628E-2) * z -
402 0.0807455121882807815) * z + 0.785398163397448310);
404 y = (piover2 - x) * fouroverpi;
406 r = ((((((-0.38577620372E-12 * z + 0.11500497024263E-9) * z -
407 0.2461136382637005E-7) * z + 0.359086044588581953E-5) * z -
408 0.325991886926687550E-3) * z + 0.0158543442438154109) * z -
409 0.308425137534042452) * z + 1.0;
411 return sign ? -r : r;
414 /* cos(x) Return cosine, x in radians, by identity */
419 x = (x < 0.0) ? -x : x;
420 if (x > twopi) /* Do range reduction here to limit */
421 x = x - (aint(x / twopi) * twopi); /* roundoff on add of PI/2 */
422 return sin(x + piover2);
425 /* tan(x) Return tangent, x in radians, by identity */
430 return sin(x) / cos(x);
433 /* sqrt(x) Return square root. Initial guess, then Newton-
434 Raphson refinement */
447 "\nGood work! You tried to take the square root of %g",
450 "\nunfortunately, that is too complex for me to handle.\n");
454 y = (0.154116 + 1.893872 * x) / (1.0 + 1.047988 * x);
456 c = (y - x / y) / 2.0;
458 for (n = 50; c != cl && n--;) {
461 c = (y - x / y) / 2.0;
466 /* atan(x) Return arctangent in radians,
467 range -pi/2 to pi/2 */
469 static double atan(x)
475 x = (((sign = (x < 0.0)) != 0) ? -x : x);
492 x = (x - z) / (x * z + 1);
496 b = ((((893025.0 * z + 49116375.0) * z + 425675250.0) * z +
497 1277025750.0) * z + 1550674125.0) * z + 654729075.0;
498 a = (((13852575.0 * z + 216602100.0) * z + 891080190.0) * z +
499 1332431100.0) * z + 654729075.0;
500 a = (a / b) * x + atanc[y];
503 return sign ? -a : a;
506 /* atan2(y,x) Return arctangent in radians of y/x,
509 static double atan2(y, x)
515 if (y == 0.0) /* Special case: atan2(0,0) = 0 */
532 /* asin(x) Return arcsine in radians of x */
534 static double asin(x)
539 "\nInverse trig functions lose much of their gloss when");
541 "\ntheir arguments are greater than 1, such as the");
543 "\nvalue %g you passed.\n", x);
546 return atan2(x, sqrt(1 - x * x));
550 /* Calculate passage through surface
552 If the variable PARAXIAL is true, the trace through the
553 surface will be done using the paraxial approximations.
554 Otherwise, the normal trigonometric trace will be done.
556 This routine takes the following inputs:
558 RADIUS_OF_CURVATURE Radius of curvature of surface
559 being crossed. If 0, surface is
562 OBJECT_DISTANCE Distance of object focus from
563 lens vertex. If 0, incoming
564 rays are parallel and
565 the following must be specified:
567 RAY_HEIGHT Height of ray from axis. Only
568 relevant if OBJECT.DISTANCE == 0
570 AXIS_SLOPE_ANGLE Angle incoming ray makes with axis
573 FROM_INDEX Refractive index of medium being left
575 TO_INDEX Refractive index of medium being
578 The outputs are the following variables:
580 OBJECT_DISTANCE Distance from vertex to object focus
583 AXIS_SLOPE_ANGLE Angle incoming ray makes with axis
584 at intercept after refraction.
588 static void transit_surface() {
589 double iang, /* Incidence angle */
590 rang, /* Refraction angle */
591 iang_sin, /* Incidence angle sin */
592 rang_sin, /* Refraction angle sin */
593 old_axis_slope_angle, sagitta;
596 if (radius_of_curvature != 0.0) {
597 if (object_distance == 0.0) {
598 axis_slope_angle = 0.0;
599 iang_sin = ray_height / radius_of_curvature;
601 iang_sin = ((object_distance -
602 radius_of_curvature) / radius_of_curvature) *
605 rang_sin = (from_index / to_index) *
607 old_axis_slope_angle = axis_slope_angle;
608 axis_slope_angle = axis_slope_angle +
610 if (object_distance != 0.0)
611 ray_height = object_distance * old_axis_slope_angle;
612 object_distance = ray_height / axis_slope_angle;
615 object_distance = object_distance * (to_index / from_index);
616 axis_slope_angle = axis_slope_angle * (from_index / to_index);
620 if (radius_of_curvature != 0.0) {
621 if (object_distance == 0.0) {
622 axis_slope_angle = 0.0;
623 iang_sin = ray_height / radius_of_curvature;
625 iang_sin = ((object_distance -
626 radius_of_curvature) / radius_of_curvature) *
627 sin(axis_slope_angle);
629 iang = asin(iang_sin);
630 rang_sin = (from_index / to_index) *
632 old_axis_slope_angle = axis_slope_angle;
633 axis_slope_angle = axis_slope_angle +
634 iang - asin(rang_sin);
635 sagitta = sin((old_axis_slope_angle + iang) / 2.0);
636 sagitta = 2.0 * radius_of_curvature*sagitta*sagitta;
637 object_distance = ((radius_of_curvature * sin(
638 old_axis_slope_angle + iang)) *
639 cot(axis_slope_angle)) + sagitta;
643 rang = -asin((from_index / to_index) *
644 sin(axis_slope_angle));
645 object_distance = object_distance * ((to_index *
646 cos(-rang)) / (from_index *
647 cos(axis_slope_angle)));
648 axis_slope_angle = -rang;
651 /* Perform ray trace in specific spectral line */
653 static void trace_line(line, ray_h)
659 object_distance = 0.0;
663 for (i = 1; i <= current_surfaces; i++) {
664 radius_of_curvature = s[i][1];
667 to_index = to_index + ((spectral_line[4] -
668 spectral_line[line]) /
669 (spectral_line[3] - spectral_line[6])) * ((s[i][2] - 1.0) /
672 from_index = to_index;
673 if (i < current_surfaces)
674 object_distance = object_distance - s[i][4];
678 /* Initialise when called the first time */
685 double od_fline, od_cline;
690 spectral_line[1] = 7621.0; /* A */
691 spectral_line[2] = 6869.955; /* B */
692 spectral_line[3] = 6562.816; /* C */
693 spectral_line[4] = 5895.944; /* D */
694 spectral_line[5] = 5269.557; /* E */
695 spectral_line[6] = 4861.344; /* F */
696 spectral_line[7] = 4340.477; /* G'*/
697 spectral_line[8] = 3968.494; /* H */
699 /* Process the number of iterations argument, if one is supplied. */
702 niter = atoi(argv[1]);
703 if (*argv[1] == '-' || niter < 1) {
704 printf("This is John Walker's floating point accuracy and\n");
705 printf("performance benchmark program. You call it with\n");
706 printf("\nfbench <itercount>\n\n");
707 printf("where <itercount> is the number of iterations\n");
708 printf("to be executed. Archival timings should be made\n");
709 printf("with the iteration count set so that roughly five\n");
710 printf("minutes of execution is timed.\n");
715 /* Load test case into working array */
717 clear_aperture = 4.0;
718 current_surfaces = 4;
719 for (i = 0; i < current_surfaces; i++)
720 for (j = 0; j < 4; j++)
721 s[i + 1][j + 1] = testcase[i][j];
724 printf("Beginning execution of floating point accuracy test...\n");
727 printf("Ready to begin John Walker's floating point accuracy\n");
728 printf("and performance benchmark. %d iterations will be made.\n\n",
731 printf("\nMeasured run time in seconds should be divided by %.f\n", niter / 1000.0);
732 printf("to normalise for reporting results. For archival results,\n");
733 printf("adjust iteration count so the benchmark runs about five minutes.\n\n");
737 /* Perform ray trace the specified number of times. */
742 if ((passes % 100L) == 0) {
743 printf("Pass %ld.\n", passes);
746 for (itercount = 0; itercount < niter; itercount++) {
749 for (paraxial = 0; paraxial <= 1; paraxial++) {
751 /* Do main trace in D light */
753 trace_line(4, clear_aperture / 2.0);
754 od_sa[paraxial][0] = object_distance;
755 od_sa[paraxial][1] = axis_slope_angle;
759 /* Trace marginal ray in C */
761 trace_line(3, clear_aperture / 2.0);
762 od_cline = object_distance;
764 /* Trace marginal ray in F */
766 trace_line(6, clear_aperture / 2.0);
767 od_fline = object_distance;
769 aberr_lspher = od_sa[1][0] - od_sa[0][0];
770 aberr_osc = 1.0 - (od_sa[1][0] * od_sa[1][1]) /
771 (sin(od_sa[0][1]) * od_sa[0][0]);
772 aberr_lchrom = od_fline - od_cline;
773 max_lspher = sin(od_sa[0][1]);
777 max_lspher = 0.0000926 / (max_lspher * max_lspher);
779 max_lchrom = max_lspher;
783 printf("Stop the timer:\007");
787 /* Now evaluate the accuracy of the results from the last ray trace */
789 sprintf(outarr[0], "%15s %21.11f %14.11f",
790 "Marginal ray", od_sa[0][0], od_sa[0][1]);
791 sprintf(outarr[1], "%15s %21.11f %14.11f",
792 "Paraxial ray", od_sa[1][0], od_sa[1][1]);
794 "Longitudinal spherical aberration: %16.11f",
797 " (Maximum permissible): %16.11f",
800 "Offense against sine condition (coma): %16.11f",
803 " (Maximum permissible): %16.11f",
806 "Axial chromatic aberration: %16.11f",
809 " (Maximum permissible): %16.11f",
812 /* Now compare the edited results with the master values from
813 reference executions of this program. */
816 for (i = 0; i < 8; i++) {
817 if (strcmp(outarr[i], refarr[i]) != 0) {
819 printf("\nError in pass %ld for results on line %d...\n",
822 printf("\nError in results on line %d...\n", i + 1);
824 printf("Expected: \"%s\"\n", refarr[i]);
825 printf("Received: \"%s\"\n", outarr[i]);
827 k = strlen(refarr[i]);
828 for (j = 0; j < k; j++) {
829 printf("%c", refarr[i][j] == outarr[i][j] ? ' ' : '^');
830 if (refarr[i][j] != outarr[i][j])
840 printf("\n%d error%s in results. This is VERY SERIOUS.\n",
841 errors, errors > 1 ? "s" : "");
843 printf("\nNo errors in results.\n");