[fpc-devel] Broken frac function in FPC3.1.1 / Windows x86_64

J. Gareth Moreton gareth at moreton-family.com
Sun Apr 29 02:46:57 CEST 2018


 I've done some speed and accuracy comparisons between our respective Frac
functions.  Initially, my "SafeFrac" was marginally faster than
"FracDoSkip", but I managed to optimise Thorsten's routine a little bit
into the following:

 function FracSkip2(const X: ValReal): ValReal; assembler; nostackframe;
 asm
   align 16
   movq      rax,  xmm0
   shr       rax,  48
   and       ax,   $7FF0
   cmp       ax,   $4330
   jge       @@zero
   cmp       ax,   $3FE0
   jbe       @@skip
   cvttsd2si rax,  xmm0
   cvtsi2sd  xmm4, rax
   subsd     xmm0, xmm4
   ret
 @@zero:
   xorpd     xmm0, xmm0
 @@skip:
 end;

 My test compared Frac, FracDoSkip, SafeFrac and what I call FracSkip2
above, which reworks the comparisons to use only 16 bits, and replaces "jmp
@@skip" with "ret". The results are as follows (as you can see... all of
them are a great improvement over Frac).  Frac raises SIGFPE if plus or
minus infinity is passed in, but our functions return zero.  This may or
may not be a desirable change.

 Code sizes (alignment will round it up to the nearest 16 bytes): Frac = 49
bytes, FracDoSkip = 52 bytes, SafeFrac = 46 bytes, FracSkip2 = 45 bytes.

 Long story short, with a few tweaks, Thorsten's routine is the fastest and
also the smallest.

 ****

 My test set was:

 DataSet: array[0..14] of Double = (1.5, 0, 2251799813685248,
4503599627370496, 1E300, 0.125, 3.6415926535897932384626433832795, -1.5,
-2251799813685248, -4503599627370496, -1E300, -0.125,
-3.6415926535897932384626433832795, Infinity, NegInfinity);

 For each value, it is tested as is, then DataSet[X] + 0.5, then DataSet[X]
- 0.5 (best way to determine how it handles precision without it being
optimised out by the compiler).

 ****

 Frac( 1.5000000000000000E+000) =  5.0000000000000000E-001 - Pass - Time =
124.483 ns
 FracDoSkip( 1.5000000000000000E+000) =  5.0000000000000000E-001 - Pass -
Time = 47.525 ns
 SafeFrac( 1.5000000000000000E+000) =  5.0000000000000000E-001 - Pass -
Time = 32.707 ns
 FracSkip2( 1.5000000000000000E+000) =  5.0000000000000000E-001 - Pass -
Time = 34.904 ns
 Frac( 2.0000000000000000E+000) =  0.0000000000000000E+000 - Pass - Time =
126.170 ns
 FracDoSkip( 2.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 51.210 ns
 SafeFrac( 2.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 35.351 ns
 FracSkip2( 2.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 33.911 ns
 Frac( 1.0000000000000000E+000) =  0.0000000000000000E+000 - Pass - Time =
125.927 ns
 FracDoSkip( 1.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 49.127 ns
 SafeFrac( 1.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 34.695 ns
 FracSkip2( 1.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 36.139 ns
 Frac( 0.0000000000000000E+000) =  0.0000000000000000E+000 - Pass - Time =
119.800 ns
 FracDoSkip( 0.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 40.316 ns
 SafeFrac( 0.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 35.875 ns
 FracSkip2( 0.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 34.046 ns
 Frac( 5.0000000000000000E-001) =  5.0000000000000000E-001 - Pass - Time =
118.913 ns
 FracDoSkip( 5.0000000000000000E-001) =  5.0000000000000000E-001 - Pass -
Time = 40.183 ns
 SafeFrac( 5.0000000000000000E-001) =  5.0000000000000000E-001 - Pass -
Time = 36.783 ns
 FracSkip2( 5.0000000000000000E-001) =  5.0000000000000000E-001 - Pass -
Time = 34.976 ns
 Frac(-5.0000000000000000E-001) = -5.0000000000000000E-001 - Pass - Time =
127.560 ns
 FracDoSkip(-5.0000000000000000E-001) = -5.0000000000000000E-001 - Pass -
Time = 41.676 ns
 SafeFrac(-5.0000000000000000E-001) = -5.0000000000000000E-001 - Pass -
Time = 36.577 ns
 FracSkip2(-5.0000000000000000E-001) = -5.0000000000000000E-001 - Pass -
Time = 34.714 ns
 Frac( 2.2517998136852480E+015) =  0.0000000000000000E+000 - Pass - Time =
126.323 ns
 FracDoSkip( 2.2517998136852480E+015) =  0.0000000000000000E+000 - Pass -
Time = 49.108 ns
 SafeFrac( 2.2517998136852480E+015) =  0.0000000000000000E+000 - Pass -
Time = 35.376 ns
 FracSkip2( 2.2517998136852480E+015) =  0.0000000000000000E+000 - Pass -
Time = 36.373 ns
 Frac( 2.2517998136852485E+015) =  5.0000000000000000E-001 - Pass - Time =
131.001 ns
 FracDoSkip( 2.2517998136852485E+015) =  5.0000000000000000E-001 - Pass -
Time = 54.474 ns
 SafeFrac( 2.2517998136852485E+015) =  5.0000000000000000E-001 - Pass -
Time = 38.834 ns
 FracSkip2( 2.2517998136852485E+015) =  5.0000000000000000E-001 - Pass -
Time = 37.139 ns
 Frac( 2.2517998136852475E+015) =  5.0000000000000000E-001 - Pass - Time =
131.932 ns
 FracDoSkip( 2.2517998136852475E+015) =  5.0000000000000000E-001 - Pass -
Time = 52.214 ns
 SafeFrac( 2.2517998136852475E+015) =  5.0000000000000000E-001 - Pass -
Time = 37.093 ns
 FracSkip2( 2.2517998136852475E+015) =  5.0000000000000000E-001 - Pass -
Time = 35.674 ns
 Frac( 4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass - Time =
82.749 ns
 FracDoSkip( 4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 38.613 ns
 SafeFrac( 4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 38.575 ns
 FracSkip2( 4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 33.970 ns
 Frac( 4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass - Time =
86.126 ns
 FracDoSkip( 4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 38.434 ns
 SafeFrac( 4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 38.636 ns
 FracSkip2( 4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 33.747 ns
 Frac( 4.5035996273704955E+015) =  5.0000000000000000E-001 - Pass - Time =
131.589 ns
 FracDoSkip( 4.5035996273704955E+015) =  5.0000000000000000E-001 - Pass -
Time = 53.594 ns
 SafeFrac( 4.5035996273704955E+015) =  5.0000000000000000E-001 - Pass -
Time = 36.617 ns
 FracSkip2( 4.5035996273704955E+015) =  5.0000000000000000E-001 - Pass -
Time = 36.509 ns
 Frac( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass - Time =
82.875 ns
 FracDoSkip( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 39.008 ns
 SafeFrac( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 39.112 ns
 FracSkip2( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 34.195 ns
 Frac( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass - Time =
85.401 ns
 FracDoSkip( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 38.653 ns
 SafeFrac( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 38.655 ns
 FracSkip2( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 34.408 ns
 Frac( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass - Time =
84.719 ns
 FracDoSkip( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 39.174 ns
 SafeFrac( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 38.876 ns
 FracSkip2( 1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 33.570 ns
 Frac( 1.2500000000000000E-001) =  1.2500000000000000E-001 - Pass - Time =
123.770 ns
 FracDoSkip( 1.2500000000000000E-001) =  1.2500000000000000E-001 - Pass -
Time = 41.642 ns
 SafeFrac( 1.2500000000000000E-001) =  1.2500000000000000E-001 - Pass -
Time = 38.704 ns
 FracSkip2( 1.2500000000000000E-001) =  1.2500000000000000E-001 - Pass -
Time = 35.399 ns
 Frac( 6.2500000000000000E-001) =  6.2500000000000000E-001 - Pass - Time =
128.967 ns
 FracDoSkip( 6.2500000000000000E-001) =  6.2500000000000000E-001 - Pass -
Time = 42.082 ns
 SafeFrac( 6.2500000000000000E-001) =  6.2500000000000000E-001 - Pass -
Time = 38.199 ns
 FracSkip2( 6.2500000000000000E-001) =  6.2500000000000000E-001 - Pass -
Time = 36.072 ns
 Frac(-3.7500000000000000E-001) = -3.7500000000000000E-001 - Pass - Time =
128.962 ns
 FracDoSkip(-3.7500000000000000E-001) = -3.7500000000000000E-001 - Pass -
Time = 40.375 ns
 SafeFrac(-3.7500000000000000E-001) = -3.7500000000000000E-001 - Pass -
Time = 37.153 ns
 FracSkip2(-3.7500000000000000E-001) = -3.7500000000000000E-001 - Pass -
Time = 34.515 ns
 Frac( 3.6415926535897931E+000) =  6.4159265358979312E-001 - Pass - Time =
129.245 ns
 FracDoSkip( 3.6415926535897931E+000) =  6.4159265358979312E-001 - Pass -
Time = 53.440 ns
 SafeFrac( 3.6415926535897931E+000) =  6.4159265358979312E-001 - Pass -
Time = 38.390 ns
 FracSkip2( 3.6415926535897931E+000) =  6.4159265358979312E-001 - Pass -
Time = 36.963 ns
 Frac( 4.1415926535897931E+000) =  1.4159265358979312E-001 - Pass - Time =
132.623 ns
 FracDoSkip( 4.1415926535897931E+000) =  1.4159265358979312E-001 - Pass -
Time = 52.325 ns
 SafeFrac( 4.1415926535897931E+000) =  1.4159265358979312E-001 - Pass -
Time = 39.016 ns
 FracSkip2( 4.1415926535897931E+000) =  1.4159265358979312E-001 - Pass -
Time = 36.818 ns
 Frac( 3.1415926535897931E+000) =  1.4159265358979312E-001 - Pass - Time =
128.032 ns
 FracDoSkip( 3.1415926535897931E+000) =  1.4159265358979312E-001 - Pass -
Time = 49.834 ns
 SafeFrac( 3.1415926535897931E+000) =  1.4159265358979312E-001 - Pass -
Time = 37.077 ns
 FracSkip2( 3.1415926535897931E+000) =  1.4159265358979312E-001 - Pass -
Time = 37.099 ns
 Frac(-1.5000000000000000E+000) = -5.0000000000000000E-001 - Pass - Time =
132.057 ns
 FracDoSkip(-1.5000000000000000E+000) = -5.0000000000000000E-001 - Pass -
Time = 53.112 ns
 SafeFrac(-1.5000000000000000E+000) = -5.0000000000000000E-001 - Pass -
Time = 38.287 ns
 FracSkip2(-1.5000000000000000E+000) = -5.0000000000000000E-001 - Pass -
Time = 36.849 ns
 Frac(-1.0000000000000000E+000) =  0.0000000000000000E+000 - Pass - Time =
130.452 ns
 FracDoSkip(-1.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 51.451 ns
 SafeFrac(-1.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 36.993 ns
 FracSkip2(-1.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 36.110 ns
 Frac(-2.0000000000000000E+000) =  0.0000000000000000E+000 - Pass - Time =
131.912 ns
 FracDoSkip(-2.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 52.946 ns
 SafeFrac(-2.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 38.330 ns
 FracSkip2(-2.0000000000000000E+000) =  0.0000000000000000E+000 - Pass -
Time = 37.156 ns
 Frac(-2.2517998136852480E+015) =  0.0000000000000000E+000 - Pass - Time =
131.354 ns
 FracDoSkip(-2.2517998136852480E+015) =  0.0000000000000000E+000 - Pass -
Time = 53.712 ns
 SafeFrac(-2.2517998136852480E+015) =  0.0000000000000000E+000 - Pass -
Time = 36.978 ns
 FracSkip2(-2.2517998136852480E+015) =  0.0000000000000000E+000 - Pass -
Time = 36.262 ns
 Frac(-2.2517998136852475E+015) = -5.0000000000000000E-001 - Pass - Time =
127.641 ns
 FracDoSkip(-2.2517998136852475E+015) = -5.0000000000000000E-001 - Pass -
Time = 52.853 ns
 SafeFrac(-2.2517998136852475E+015) = -5.0000000000000000E-001 - Pass -
Time = 38.318 ns
 FracSkip2(-2.2517998136852475E+015) = -5.0000000000000000E-001 - Pass -
Time = 37.286 ns
 Frac(-2.2517998136852485E+015) = -5.0000000000000000E-001 - Pass - Time =
130.918 ns
 FracDoSkip(-2.2517998136852485E+015) = -5.0000000000000000E-001 - Pass -
Time = 52.916 ns
 SafeFrac(-2.2517998136852485E+015) = -5.0000000000000000E-001 - Pass -
Time = 37.928 ns
 FracSkip2(-2.2517998136852485E+015) = -5.0000000000000000E-001 - Pass -
Time = 36.701 ns
 Frac(-4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass - Time =
82.714 ns
 FracDoSkip(-4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 37.410 ns
 SafeFrac(-4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 37.091 ns
 FracSkip2(-4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 33.130 ns
 Frac(-4.5035996273704955E+015) = -5.0000000000000000E-001 - Pass - Time =
131.699 ns
 FracDoSkip(-4.5035996273704955E+015) = -5.0000000000000000E-001 - Pass -
Time = 52.932 ns
 SafeFrac(-4.5035996273704955E+015) = -5.0000000000000000E-001 - Pass -
Time = 38.499 ns
 FracSkip2(-4.5035996273704955E+015) = -5.0000000000000000E-001 - Pass -
Time = 37.341 ns
 Frac(-4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass - Time =
85.069 ns
 FracDoSkip(-4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 38.384 ns
 SafeFrac(-4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 39.041 ns
 FracSkip2(-4.5035996273704960E+015) =  0.0000000000000000E+000 - Pass -
Time = 34.266 ns
 Frac(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass - Time =
81.913 ns
 FracDoSkip(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 37.216 ns
 SafeFrac(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 37.385 ns
 FracSkip2(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 34.328 ns
 Frac(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass - Time =
85.317 ns
 FracDoSkip(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 38.639 ns
 SafeFrac(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 38.644 ns
 FracSkip2(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 34.293 ns
 Frac(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass - Time =
85.878 ns
 FracDoSkip(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 38.932 ns
 SafeFrac(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 38.651 ns
 FracSkip2(-1.0000000000000001E+300) =  0.0000000000000000E+000 - Pass -
Time = 34.316 ns
 Frac(-1.2500000000000000E-001) = -1.2500000000000000E-001 - Pass - Time =
128.603 ns
 FracDoSkip(-1.2500000000000000E-001) = -1.2500000000000000E-001 - Pass -
Time = 41.592 ns
 SafeFrac(-1.2500000000000000E-001) = -1.2500000000000000E-001 - Pass -
Time = 37.280 ns
 FracSkip2(-1.2500000000000000E-001) = -1.2500000000000000E-001 - Pass -
Time = 34.995 ns
 Frac( 3.7500000000000000E-001) =  3.7500000000000000E-001 - Pass - Time =
124.473 ns
 FracDoSkip( 3.7500000000000000E-001) =  3.7500000000000000E-001 - Pass -
Time = 42.099 ns
 SafeFrac( 3.7500000000000000E-001) =  3.7500000000000000E-001 - Pass -
Time = 38.716 ns
 FracSkip2( 3.7500000000000000E-001) =  3.7500000000000000E-001 - Pass -
Time = 36.194 ns
 Frac(-6.2500000000000000E-001) = -6.2500000000000000E-001 - Pass - Time =
129.138 ns
 FracDoSkip(-6.2500000000000000E-001) = -6.2500000000000000E-001 - Pass -
Time = 42.219 ns
 SafeFrac(-6.2500000000000000E-001) = -6.2500000000000000E-001 - Pass -
Time = 39.724 ns
 FracSkip2(-6.2500000000000000E-001) = -6.2500000000000000E-001 - Pass -
Time = 34.307 ns
 Frac(-3.6415926535897931E+000) = -6.4159265358979312E-001 - Pass - Time =
129.833 ns
 FracDoSkip(-3.6415926535897931E+000) = -6.4159265358979312E-001 - Pass -
Time = 51.274 ns
 SafeFrac(-3.6415926535897931E+000) = -6.4159265358979312E-001 - Pass -
Time = 38.494 ns
 FracSkip2(-3.6415926535897931E+000) = -6.4159265358979312E-001 - Pass -
Time = 37.459 ns
 Frac(-3.1415926535897931E+000) = -1.4159265358979312E-001 - Pass - Time =
132.230 ns
 FracDoSkip(-3.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 53.066 ns
 SafeFrac(-3.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 38.658 ns
 FracSkip2(-3.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 36.351 ns
 Frac(-4.1415926535897931E+000) = -1.4159265358979312E-001 - Pass - Time =
126.783 ns
 FracDoSkip(-4.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 51.889 ns
 SafeFrac(-4.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 38.785 ns
 FracSkip2(-4.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 36.711 ns
 Frac(+Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
 FracDoSkip(+Inf) =  0.0000000000000000E+000 - Pass - Time = 39.849 ns
 SafeFrac(+Inf) =  0.0000000000000000E+000 - Pass - Time = 38.889 ns
 FracSkip2(+Inf) =  0.0000000000000000E+000 - Pass - Time = 34.289 ns
 Frac(+Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
 FracDoSkip(+Inf) =  0.0000000000000000E+000 - Pass - Time = 40.781 ns
 SafeFrac(+Inf) =  0.0000000000000000E+000 - Pass - Time = 37.504 ns
 FracSkip2(+Inf) =  0.0000000000000000E+000 - Pass - Time = 33.043 ns
 Frac(+Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
 FracDoSkip(+Inf) =  0.0000000000000000E+000 - Pass - Time = 40.993 ns
 SafeFrac(+Inf) =  0.0000000000000000E+000 - Pass - Time = 39.575 ns
 FracSkip2(+Inf) =  0.0000000000000000E+000 - Pass - Time = 33.041 ns
 Frac(-Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
 FracDoSkip(-Inf) =  0.0000000000000000E+000 - Pass - Time = 40.414 ns
 SafeFrac(-Inf) =  0.0000000000000000E+000 - Pass - Time = 37.835 ns
 FracSkip2(-Inf) =  0.0000000000000000E+000 - Pass - Time = 33.294 ns
 Frac(-Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
 FracDoSkip(-Inf) =  0.0000000000000000E+000 - Pass - Time = 39.871 ns
 SafeFrac(-Inf) =  0.0000000000000000E+000 - Pass - Time = 37.885 ns
 FracSkip2(-Inf) =  0.0000000000000000E+000 - Pass - Time = 34.041 ns
 Frac(-Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
 FracDoSkip(-Inf) =  0.0000000000000000E+000 - Pass - Time = 40.437 ns
 SafeFrac(-Inf) =  0.0000000000000000E+000 - Pass - Time = 38.868 ns
 FracSkip2(-Inf) =  0.0000000000000000E+000 - Pass - Time = 33.819 ns

 ****

 Gareth aka. Kit
 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180429/9075636b/attachment.html>


More information about the fpc-devel mailing list