[fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
Thorsten Engler
thorsten.engler at gmx.net
Sat Apr 28 17:57:59 CEST 2018
> -----Original Message-----
> From: fpc-devel <fpc-devel-bounces at lists.freepascal.org> On Behalf
> Of Florian Klämpfl
> So something like
>
> cmp edx, $43300000
> jge @@zero
> cmp edx, $3FE00000
> .align 16
> jbe @@skip
>
> might be much better.
That ended up making things worse in some cases.
Here is a branchless version:
function Frac1(const X: Double): Double;
asm
.noframe
movq rdx, xmm0
mov rax, rdx
xor rcx, rcx
shr rdx, 32
and edx, $7FF00000
cmp edx, $43300000
cmovge rax, rcx
movq xmm0, rax
cvttsd2si rax, xmm0
cvtsi2sd xmm4, rax
subsd xmm0, xmm4
end;
It performs slightly slower in the "in range" case, noticeable worse in the other 2 cases (as it's exactly the same for all 3).
I would guess that the "in range" case is the most common (you aren't going to call Frac if you know ahead of time that it's always 0 as the number is too big, or if you know that it already is a value between -1 and 1), so the higher cost for the out of range and only fraction cases is probably less important than it might look.
It IS largely independent of code alignment or predictable patterns in the incoming value:
Code address:
Frac1: 0000000000536430 (48)
Frac2: 0000000000536480 (0)
Frac3: 00000000005364D0 (80)
Frac4: 0000000000536520 (32)
Frac5: 0000000000536570 (112)
Frac6: 00000000005365C0 (64)
Frac7: 0000000000536610 (16)
Frac8: 0000000000536660 (96)
1st run:
In range (1e15+0.5):
Frac1 1431794
Frac2 1429232
Frac3 1463357
Frac4 1475042
Frac5 1446016
Frac6 1472979
Frac7 1443244
Frac8 1467528
Out of range (1e16+0.5):
Frac1 1476556
Frac2 1458534
Frac3 1444431
Frac4 1427287
Frac5 1427326
Frac6 1427472
Frac7 1428914
Frac8 1419654
Only fraction (0.5):
Frac1 1470644
Frac2 1475227
Frac3 1447379
Frac4 1529162
Frac5 1509275
Frac6 1485185
Frac7 1500826
Frac8 1524294
Code address:
Frac1: 0000000000536423 (35)
Frac2: 0000000000536458 (88)
Frac3: 000000000053648D (13)
Frac4: 00000000005364C2 (66)
Frac5: 00000000005364F7 (119)
Frac6: 000000000053652C (44)
Frac7: 0000000000536561 (97)
Frac8: 0000000000536596 (22)
1st run:
In range (1e15+0.5):
Frac1 1349334
Frac2 1429198
Frac3 1447011
Frac4 1436476
Frac5 1477058
Frac6 1496887
Frac7 1431293
Frac8 1435460
Out of range (1e16+0.5):
Frac1 1349939
Frac2 1412543
Frac3 1462295
Frac4 1442081
Frac5 1512579
Frac6 1453593
Frac7 1457510
Frac8 1436533
Only fraction (0.5):
Frac1 1371353
Frac2 1443000
Frac3 1437583
Frac4 1415591
Frac5 1474870
Frac6 1437224
Frac7 1452196
Frac8 1453833
Also, it still outperforms Delphi's Frac in all cases.
More information about the fpc-devel
mailing list