[fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
Thorsten Engler
thorsten.engler at gmx.net
Sun Apr 29 10:04:28 CEST 2018
> From: fpc-devel <fpc-devel-bounces at lists.freepascal.org> On Behalf Of J. Gareth Moreton
> Sent: Sunday, 29 April 2018 12:36
> As an extra point, removing the 'skip' check (i.e. cmp ax, $3FE0, jbe @@skip)
> removes 6 bytes from the code size and shaves about 2 to 3 nanoseconds off
> the execution time in most cases, and it could be argued that it's worth
> going for the 'no skip' version because using Frac on a value of x where
> |x| < 1 is rather uncommon compared to when |x| >= 1.
I agree that calling Frac on values that are already just a fraction is probably not going to happen too often.
> However, when running my timing tests, one thing that's confused me
> is that when using very large inputs like 10^300, the function is
> at least 5 nanoseconds slower than FracSkip2, even though the code
> is less complex. This happens even if I put 'align 16' before the @@zero label.
I do not see any noticeable difference between 1e16 and 1e300 as inputs:
Code address:
Frac1: 0000000000536440 (64)
Frac2: 0000000000536490 (16)
Frac3: 00000000005364E0 (96)
Frac4: 0000000000536530 (48)
Frac5: 0000000000536580 (0)
Frac6: 00000000005365D0 (80)
Frac7: 0000000000536620 (32)
Frac8: 0000000000536670 (112)
1st run:
In range (1e15+0.5):
Frac1 923470
Frac2 964422
Frac3 967501
Frac4 1027080
Frac5 1005352
Frac6 1052105
Frac7 1011983
Frac8 1048743
Out of range (1e16+0.5):
Frac1 893526
Frac2 998532
Frac3 894644
Frac4 993987
Frac5 895353
Frac6 994606
Frac7 900848
Frac8 992751
Out of range (1e300):
Frac1 897274
Frac2 986679
Frac3 899123
Frac4 999495
Frac5 899438
Frac6 989588
Frac7 885060
Frac8 985288
Only fraction (0.5):
Frac1 954220
Frac2 1046781
Frac3 993959
Frac4 1015032
Frac5 1013128
Frac6 1043157
Frac7 928712
Frac8 988220
Also, it seems to be relatively resilient against changes in code alignment even if it's not a multiple of 16:
Code address:
Frac1: 0000000000536433 (51)
Frac2: 000000000053645D (93)
Frac3: 0000000000536487 (7)
Frac4: 00000000005364B1 (49)
Frac5: 00000000005364DB (91)
Frac6: 0000000000536505 (5)
Frac7: 000000000053652F (47)
Frac8: 0000000000536559 (89)
1st run:
In range (1e15+0.5):
Frac1 946247
Frac2 904187
Frac3 902870
Frac4 1025163
Frac5 931021
Frac6 895990
Frac7 1050683
Frac8 952305
Out of range (1e16+0.5):
Frac1 883588
Frac2 877412
Frac3 809785
Frac4 831095
Frac5 976555
Frac6 711201
Frac7 791657
Frac8 897085
Out of range (1e300):
Frac1 902103
Frac2 901861
Frac3 802404
Frac4 808002
Frac5 972999
Frac6 710888
Frac7 804050
Frac8 875901
Only fraction (0.5):
Frac1 945212
Frac2 904468
Frac3 915325
Frac4 997584
Frac5 945569
Frac6 898036
Frac7 1071561
Frac8 906152
> Nevertheless, I conclude that for most situations, using the improved
> FracNoSkip gives the best performance and size for typical inputs,
> but this may depend on an individual machine's architecture.
Seems we got a winner.
I was considering the ret like that, but didn't do it as I was worried because SEH under windows expects function prologues and epilogues that exactly match a specific pattern. But in hindsight, this is a no stack frame leaf function anyway, so I don't think that matters.
Cheers,
Thorsten
More information about the fpc-devel
mailing list