[fpc-devel] Broken frac function in FPC3.1.1 / Windows x86_64
J. Gareth Moreton
gareth at moreton-family.com
Sun Apr 29 02:46:57 CEST 2018
I've done some speed and accuracy comparisons between our respective Frac
functions. Initially, my "SafeFrac" was marginally faster than
"FracDoSkip", but I managed to optimise Thorsten's routine a little bit
into the following:
function FracSkip2(const X: ValReal): ValReal; assembler; nostackframe;
asm
align 16
movq rax, xmm0
shr rax, 48
and ax, $7FF0
cmp ax, $4330
jge @@zero
cmp ax, $3FE0
jbe @@skip
cvttsd2si rax, xmm0
cvtsi2sd xmm4, rax
subsd xmm0, xmm4
ret
@@zero:
xorpd xmm0, xmm0
@@skip:
end;
My test compared Frac, FracDoSkip, SafeFrac and what I call FracSkip2
above, which reworks the comparisons to use only 16 bits, and replaces "jmp
@@skip" with "ret". The results are as follows (as you can see... all of
them are a great improvement over Frac). Frac raises SIGFPE if plus or
minus infinity is passed in, but our functions return zero. This may or
may not be a desirable change.
Code sizes (alignment will round it up to the nearest 16 bytes): Frac = 49
bytes, FracDoSkip = 52 bytes, SafeFrac = 46 bytes, FracSkip2 = 45 bytes.
Long story short, with a few tweaks, Thorsten's routine is the fastest and
also the smallest.
****
My test set was:
DataSet: array[0..14] of Double = (1.5, 0, 2251799813685248,
4503599627370496, 1E300, 0.125, 3.6415926535897932384626433832795, -1.5,
-2251799813685248, -4503599627370496, -1E300, -0.125,
-3.6415926535897932384626433832795, Infinity, NegInfinity);
For each value, it is tested as is, then DataSet[X] + 0.5, then DataSet[X]
- 0.5 (best way to determine how it handles precision without it being
optimised out by the compiler).
****
Frac( 1.5000000000000000E+000) = 5.0000000000000000E-001 - Pass - Time =
124.483 ns
FracDoSkip( 1.5000000000000000E+000) = 5.0000000000000000E-001 - Pass -
Time = 47.525 ns
SafeFrac( 1.5000000000000000E+000) = 5.0000000000000000E-001 - Pass -
Time = 32.707 ns
FracSkip2( 1.5000000000000000E+000) = 5.0000000000000000E-001 - Pass -
Time = 34.904 ns
Frac( 2.0000000000000000E+000) = 0.0000000000000000E+000 - Pass - Time =
126.170 ns
FracDoSkip( 2.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 51.210 ns
SafeFrac( 2.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 35.351 ns
FracSkip2( 2.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 33.911 ns
Frac( 1.0000000000000000E+000) = 0.0000000000000000E+000 - Pass - Time =
125.927 ns
FracDoSkip( 1.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 49.127 ns
SafeFrac( 1.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 34.695 ns
FracSkip2( 1.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 36.139 ns
Frac( 0.0000000000000000E+000) = 0.0000000000000000E+000 - Pass - Time =
119.800 ns
FracDoSkip( 0.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 40.316 ns
SafeFrac( 0.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 35.875 ns
FracSkip2( 0.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 34.046 ns
Frac( 5.0000000000000000E-001) = 5.0000000000000000E-001 - Pass - Time =
118.913 ns
FracDoSkip( 5.0000000000000000E-001) = 5.0000000000000000E-001 - Pass -
Time = 40.183 ns
SafeFrac( 5.0000000000000000E-001) = 5.0000000000000000E-001 - Pass -
Time = 36.783 ns
FracSkip2( 5.0000000000000000E-001) = 5.0000000000000000E-001 - Pass -
Time = 34.976 ns
Frac(-5.0000000000000000E-001) = -5.0000000000000000E-001 - Pass - Time =
127.560 ns
FracDoSkip(-5.0000000000000000E-001) = -5.0000000000000000E-001 - Pass -
Time = 41.676 ns
SafeFrac(-5.0000000000000000E-001) = -5.0000000000000000E-001 - Pass -
Time = 36.577 ns
FracSkip2(-5.0000000000000000E-001) = -5.0000000000000000E-001 - Pass -
Time = 34.714 ns
Frac( 2.2517998136852480E+015) = 0.0000000000000000E+000 - Pass - Time =
126.323 ns
FracDoSkip( 2.2517998136852480E+015) = 0.0000000000000000E+000 - Pass -
Time = 49.108 ns
SafeFrac( 2.2517998136852480E+015) = 0.0000000000000000E+000 - Pass -
Time = 35.376 ns
FracSkip2( 2.2517998136852480E+015) = 0.0000000000000000E+000 - Pass -
Time = 36.373 ns
Frac( 2.2517998136852485E+015) = 5.0000000000000000E-001 - Pass - Time =
131.001 ns
FracDoSkip( 2.2517998136852485E+015) = 5.0000000000000000E-001 - Pass -
Time = 54.474 ns
SafeFrac( 2.2517998136852485E+015) = 5.0000000000000000E-001 - Pass -
Time = 38.834 ns
FracSkip2( 2.2517998136852485E+015) = 5.0000000000000000E-001 - Pass -
Time = 37.139 ns
Frac( 2.2517998136852475E+015) = 5.0000000000000000E-001 - Pass - Time =
131.932 ns
FracDoSkip( 2.2517998136852475E+015) = 5.0000000000000000E-001 - Pass -
Time = 52.214 ns
SafeFrac( 2.2517998136852475E+015) = 5.0000000000000000E-001 - Pass -
Time = 37.093 ns
FracSkip2( 2.2517998136852475E+015) = 5.0000000000000000E-001 - Pass -
Time = 35.674 ns
Frac( 4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass - Time =
82.749 ns
FracDoSkip( 4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 38.613 ns
SafeFrac( 4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 38.575 ns
FracSkip2( 4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 33.970 ns
Frac( 4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass - Time =
86.126 ns
FracDoSkip( 4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 38.434 ns
SafeFrac( 4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 38.636 ns
FracSkip2( 4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 33.747 ns
Frac( 4.5035996273704955E+015) = 5.0000000000000000E-001 - Pass - Time =
131.589 ns
FracDoSkip( 4.5035996273704955E+015) = 5.0000000000000000E-001 - Pass -
Time = 53.594 ns
SafeFrac( 4.5035996273704955E+015) = 5.0000000000000000E-001 - Pass -
Time = 36.617 ns
FracSkip2( 4.5035996273704955E+015) = 5.0000000000000000E-001 - Pass -
Time = 36.509 ns
Frac( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass - Time =
82.875 ns
FracDoSkip( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 39.008 ns
SafeFrac( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 39.112 ns
FracSkip2( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 34.195 ns
Frac( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass - Time =
85.401 ns
FracDoSkip( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 38.653 ns
SafeFrac( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 38.655 ns
FracSkip2( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 34.408 ns
Frac( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass - Time =
84.719 ns
FracDoSkip( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 39.174 ns
SafeFrac( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 38.876 ns
FracSkip2( 1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 33.570 ns
Frac( 1.2500000000000000E-001) = 1.2500000000000000E-001 - Pass - Time =
123.770 ns
FracDoSkip( 1.2500000000000000E-001) = 1.2500000000000000E-001 - Pass -
Time = 41.642 ns
SafeFrac( 1.2500000000000000E-001) = 1.2500000000000000E-001 - Pass -
Time = 38.704 ns
FracSkip2( 1.2500000000000000E-001) = 1.2500000000000000E-001 - Pass -
Time = 35.399 ns
Frac( 6.2500000000000000E-001) = 6.2500000000000000E-001 - Pass - Time =
128.967 ns
FracDoSkip( 6.2500000000000000E-001) = 6.2500000000000000E-001 - Pass -
Time = 42.082 ns
SafeFrac( 6.2500000000000000E-001) = 6.2500000000000000E-001 - Pass -
Time = 38.199 ns
FracSkip2( 6.2500000000000000E-001) = 6.2500000000000000E-001 - Pass -
Time = 36.072 ns
Frac(-3.7500000000000000E-001) = -3.7500000000000000E-001 - Pass - Time =
128.962 ns
FracDoSkip(-3.7500000000000000E-001) = -3.7500000000000000E-001 - Pass -
Time = 40.375 ns
SafeFrac(-3.7500000000000000E-001) = -3.7500000000000000E-001 - Pass -
Time = 37.153 ns
FracSkip2(-3.7500000000000000E-001) = -3.7500000000000000E-001 - Pass -
Time = 34.515 ns
Frac( 3.6415926535897931E+000) = 6.4159265358979312E-001 - Pass - Time =
129.245 ns
FracDoSkip( 3.6415926535897931E+000) = 6.4159265358979312E-001 - Pass -
Time = 53.440 ns
SafeFrac( 3.6415926535897931E+000) = 6.4159265358979312E-001 - Pass -
Time = 38.390 ns
FracSkip2( 3.6415926535897931E+000) = 6.4159265358979312E-001 - Pass -
Time = 36.963 ns
Frac( 4.1415926535897931E+000) = 1.4159265358979312E-001 - Pass - Time =
132.623 ns
FracDoSkip( 4.1415926535897931E+000) = 1.4159265358979312E-001 - Pass -
Time = 52.325 ns
SafeFrac( 4.1415926535897931E+000) = 1.4159265358979312E-001 - Pass -
Time = 39.016 ns
FracSkip2( 4.1415926535897931E+000) = 1.4159265358979312E-001 - Pass -
Time = 36.818 ns
Frac( 3.1415926535897931E+000) = 1.4159265358979312E-001 - Pass - Time =
128.032 ns
FracDoSkip( 3.1415926535897931E+000) = 1.4159265358979312E-001 - Pass -
Time = 49.834 ns
SafeFrac( 3.1415926535897931E+000) = 1.4159265358979312E-001 - Pass -
Time = 37.077 ns
FracSkip2( 3.1415926535897931E+000) = 1.4159265358979312E-001 - Pass -
Time = 37.099 ns
Frac(-1.5000000000000000E+000) = -5.0000000000000000E-001 - Pass - Time =
132.057 ns
FracDoSkip(-1.5000000000000000E+000) = -5.0000000000000000E-001 - Pass -
Time = 53.112 ns
SafeFrac(-1.5000000000000000E+000) = -5.0000000000000000E-001 - Pass -
Time = 38.287 ns
FracSkip2(-1.5000000000000000E+000) = -5.0000000000000000E-001 - Pass -
Time = 36.849 ns
Frac(-1.0000000000000000E+000) = 0.0000000000000000E+000 - Pass - Time =
130.452 ns
FracDoSkip(-1.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 51.451 ns
SafeFrac(-1.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 36.993 ns
FracSkip2(-1.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 36.110 ns
Frac(-2.0000000000000000E+000) = 0.0000000000000000E+000 - Pass - Time =
131.912 ns
FracDoSkip(-2.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 52.946 ns
SafeFrac(-2.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 38.330 ns
FracSkip2(-2.0000000000000000E+000) = 0.0000000000000000E+000 - Pass -
Time = 37.156 ns
Frac(-2.2517998136852480E+015) = 0.0000000000000000E+000 - Pass - Time =
131.354 ns
FracDoSkip(-2.2517998136852480E+015) = 0.0000000000000000E+000 - Pass -
Time = 53.712 ns
SafeFrac(-2.2517998136852480E+015) = 0.0000000000000000E+000 - Pass -
Time = 36.978 ns
FracSkip2(-2.2517998136852480E+015) = 0.0000000000000000E+000 - Pass -
Time = 36.262 ns
Frac(-2.2517998136852475E+015) = -5.0000000000000000E-001 - Pass - Time =
127.641 ns
FracDoSkip(-2.2517998136852475E+015) = -5.0000000000000000E-001 - Pass -
Time = 52.853 ns
SafeFrac(-2.2517998136852475E+015) = -5.0000000000000000E-001 - Pass -
Time = 38.318 ns
FracSkip2(-2.2517998136852475E+015) = -5.0000000000000000E-001 - Pass -
Time = 37.286 ns
Frac(-2.2517998136852485E+015) = -5.0000000000000000E-001 - Pass - Time =
130.918 ns
FracDoSkip(-2.2517998136852485E+015) = -5.0000000000000000E-001 - Pass -
Time = 52.916 ns
SafeFrac(-2.2517998136852485E+015) = -5.0000000000000000E-001 - Pass -
Time = 37.928 ns
FracSkip2(-2.2517998136852485E+015) = -5.0000000000000000E-001 - Pass -
Time = 36.701 ns
Frac(-4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass - Time =
82.714 ns
FracDoSkip(-4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 37.410 ns
SafeFrac(-4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 37.091 ns
FracSkip2(-4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 33.130 ns
Frac(-4.5035996273704955E+015) = -5.0000000000000000E-001 - Pass - Time =
131.699 ns
FracDoSkip(-4.5035996273704955E+015) = -5.0000000000000000E-001 - Pass -
Time = 52.932 ns
SafeFrac(-4.5035996273704955E+015) = -5.0000000000000000E-001 - Pass -
Time = 38.499 ns
FracSkip2(-4.5035996273704955E+015) = -5.0000000000000000E-001 - Pass -
Time = 37.341 ns
Frac(-4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass - Time =
85.069 ns
FracDoSkip(-4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 38.384 ns
SafeFrac(-4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 39.041 ns
FracSkip2(-4.5035996273704960E+015) = 0.0000000000000000E+000 - Pass -
Time = 34.266 ns
Frac(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass - Time =
81.913 ns
FracDoSkip(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 37.216 ns
SafeFrac(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 37.385 ns
FracSkip2(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 34.328 ns
Frac(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass - Time =
85.317 ns
FracDoSkip(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 38.639 ns
SafeFrac(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 38.644 ns
FracSkip2(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 34.293 ns
Frac(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass - Time =
85.878 ns
FracDoSkip(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 38.932 ns
SafeFrac(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 38.651 ns
FracSkip2(-1.0000000000000001E+300) = 0.0000000000000000E+000 - Pass -
Time = 34.316 ns
Frac(-1.2500000000000000E-001) = -1.2500000000000000E-001 - Pass - Time =
128.603 ns
FracDoSkip(-1.2500000000000000E-001) = -1.2500000000000000E-001 - Pass -
Time = 41.592 ns
SafeFrac(-1.2500000000000000E-001) = -1.2500000000000000E-001 - Pass -
Time = 37.280 ns
FracSkip2(-1.2500000000000000E-001) = -1.2500000000000000E-001 - Pass -
Time = 34.995 ns
Frac( 3.7500000000000000E-001) = 3.7500000000000000E-001 - Pass - Time =
124.473 ns
FracDoSkip( 3.7500000000000000E-001) = 3.7500000000000000E-001 - Pass -
Time = 42.099 ns
SafeFrac( 3.7500000000000000E-001) = 3.7500000000000000E-001 - Pass -
Time = 38.716 ns
FracSkip2( 3.7500000000000000E-001) = 3.7500000000000000E-001 - Pass -
Time = 36.194 ns
Frac(-6.2500000000000000E-001) = -6.2500000000000000E-001 - Pass - Time =
129.138 ns
FracDoSkip(-6.2500000000000000E-001) = -6.2500000000000000E-001 - Pass -
Time = 42.219 ns
SafeFrac(-6.2500000000000000E-001) = -6.2500000000000000E-001 - Pass -
Time = 39.724 ns
FracSkip2(-6.2500000000000000E-001) = -6.2500000000000000E-001 - Pass -
Time = 34.307 ns
Frac(-3.6415926535897931E+000) = -6.4159265358979312E-001 - Pass - Time =
129.833 ns
FracDoSkip(-3.6415926535897931E+000) = -6.4159265358979312E-001 - Pass -
Time = 51.274 ns
SafeFrac(-3.6415926535897931E+000) = -6.4159265358979312E-001 - Pass -
Time = 38.494 ns
FracSkip2(-3.6415926535897931E+000) = -6.4159265358979312E-001 - Pass -
Time = 37.459 ns
Frac(-3.1415926535897931E+000) = -1.4159265358979312E-001 - Pass - Time =
132.230 ns
FracDoSkip(-3.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 53.066 ns
SafeFrac(-3.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 38.658 ns
FracSkip2(-3.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 36.351 ns
Frac(-4.1415926535897931E+000) = -1.4159265358979312E-001 - Pass - Time =
126.783 ns
FracDoSkip(-4.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 51.889 ns
SafeFrac(-4.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 38.785 ns
FracSkip2(-4.1415926535897931E+000) = -1.4159265358979312E-001 - Pass -
Time = 36.711 ns
Frac(+Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
FracDoSkip(+Inf) = 0.0000000000000000E+000 - Pass - Time = 39.849 ns
SafeFrac(+Inf) = 0.0000000000000000E+000 - Pass - Time = 38.889 ns
FracSkip2(+Inf) = 0.0000000000000000E+000 - Pass - Time = 34.289 ns
Frac(+Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
FracDoSkip(+Inf) = 0.0000000000000000E+000 - Pass - Time = 40.781 ns
SafeFrac(+Inf) = 0.0000000000000000E+000 - Pass - Time = 37.504 ns
FracSkip2(+Inf) = 0.0000000000000000E+000 - Pass - Time = 33.043 ns
Frac(+Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
FracDoSkip(+Inf) = 0.0000000000000000E+000 - Pass - Time = 40.993 ns
SafeFrac(+Inf) = 0.0000000000000000E+000 - Pass - Time = 39.575 ns
FracSkip2(+Inf) = 0.0000000000000000E+000 - Pass - Time = 33.041 ns
Frac(-Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
FracDoSkip(-Inf) = 0.0000000000000000E+000 - Pass - Time = 40.414 ns
SafeFrac(-Inf) = 0.0000000000000000E+000 - Pass - Time = 37.835 ns
FracSkip2(-Inf) = 0.0000000000000000E+000 - Pass - Time = 33.294 ns
Frac(-Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
FracDoSkip(-Inf) = 0.0000000000000000E+000 - Pass - Time = 39.871 ns
SafeFrac(-Inf) = 0.0000000000000000E+000 - Pass - Time = 37.885 ns
FracSkip2(-Inf) = 0.0000000000000000E+000 - Pass - Time = 34.041 ns
Frac(-Inf) = EXCEPTION - EInvalidOp raised with message "Invalid floating
point operation"
FracDoSkip(-Inf) = 0.0000000000000000E+000 - Pass - Time = 40.437 ns
SafeFrac(-Inf) = 0.0000000000000000E+000 - Pass - Time = 38.868 ns
FracSkip2(-Inf) = 0.0000000000000000E+000 - Pass - Time = 33.819 ns
****
Gareth aka. Kit
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180429/9075636b/attachment.html>
More information about the fpc-devel
mailing list