[fpc-devel] Broken frac function in FPC3.1.1 / Windows x86_64
Thorsten Engler
thorsten.engler at gmx.net
Sat Apr 28 10:11:39 CEST 2018
Oops, small mistake caused by last minute change (I replaced rol with shl): it needs to be shr (or ror or rol, they all perform about the same on my cpu).
And in case anyone wonders, the first cmp and branch returns 0 for numbers that would cause an integer overflow, and the 2nd cmp and branch skips the whole thing if the input is between -1 and 1 (so it already is just a fraction).
Also, at least on my CPU (AMD Phenom II X6, so not exactly the newest) the effect of code alignment on performance is huge. (It affects the branch predictor I think.) I’m not sure what the best alignment is for all CPUs.
.align 16 forces alignment of the function entry point to a multiple of 16. If I add between 0 and 3 nops at the start of the function, the timings for calling it 10 million times are:
In range (1e15+0.5):
0 nop 1266149
1 nop 4260343
2 nop 1369745
3 nop 4469482
Out of range (1e16+0.5):
0 nop 881536
1 nop 896240
2 nop 890805
3 nop 871582
Only fraction (0.5):
0 nop 894850
1 nop 2219469
2 nop 955618
3 nop 2303233
Leaving out the check if it’s already a fraction decreases the time for in-range numbers and increases it for once that are already a faction:
In range (1e15+0.5):
do skip 1306063
no skip 1121395
Out of range (1e16+0.5):
do skip 887081
no skip 888925
Only fraction (0.5):
do skip 903330
no skip 1124026
function FracDoSkip(const X: Double): Double;
asm
.align 16
.noframe
movq rdx, xmm0
rol rdx, 32
and edx, $7FF00000
cmp edx, $43300000
jge @@zero
cmp edx, $3FE00000
jbe @@skip
cvttsd2si rax, xmm0
cvtsi2sd xmm4, rax
subsd xmm0, xmm4
jmp @@skip
@@zero:
xorpd xmm0, xmm0
@@skip:
end;
function FracNoSkip(const X: Double): Double;
asm
.align 16
.noframe
movq rdx, xmm0
rol rdx, 32
and edx, $7FF00000
cmp edx, $43300000
jge @@zero
// cmp edx, $3FE00000
// jbe @@skip
cvttsd2si rax, xmm0
cvtsi2sd xmm4, rax
subsd xmm0, xmm4
jmp @@skip
@@zero:
xorpd xmm0, xmm0
@@skip:
end;
Cheers,
Thorsten
From: fpc-devel <fpc-devel-bounces at lists.freepascal.org> On Behalf Of Thorsten Engler
Sent: Saturday, 28 April 2018 15:37
To: 'FPC developers' list' <fpc-devel at lists.freepascal.org>
Subject: Re: [fpc-devel] *** GMX Spamverdacht *** Re: Broken frac function in FPC3.1.1 / Windows x86_64
I’ve only tested it in Delphi, so you’ll have to convert it to the right syntax for fpc, but either of these should do:
function Frac1(const X: Double): Double;
asm
.align 16
.noframe
movq rdx, xmm0
shl rdx, 32
and edx, $7FF00000
cmp edx, $43300000
jge @@zero
cmp edx, $3FE00000
jbe @@skip
cvttsd2si rax, xmm0
cvtsi2sd xmm4, rax
subsd xmm0, xmm4
jmp @@skip
@@zero:
xorpd xmm0, xmm0
@@skip:
end;
function Frac2(const X: Double): Double;
asm
.align 16
.noframe
movq rdx, xmm0
shl rdx, 48
and dx, $7FF0
cmp dx, $4330
jge @@zero
cmp dx, $3FE0
jbe @@skip
cvttsd2si rax, xmm0
cvtsi2sd xmm4, rax
subsd xmm0, xmm4
jmp @@skip
@@zero:
xorpd xmm0, xmm0
@@skip:
end;
From: fpc-devel <fpc-devel-bounces at lists.freepascal.org <mailto:fpc-devel-bounces at lists.freepascal.org> > On Behalf Of Sven Barth via fpc-devel
Sent: Friday, 27 April 2018 23:47
To: FPC developers' list <fpc-devel at lists.freepascal.org <mailto:fpc-devel at lists.freepascal.org> >
Cc: Sven Barth <pascaldragon at googlemail.com <mailto:pascaldragon at googlemail.com> >
Subject: *** GMX Spamverdacht *** Re: [fpc-devel] Broken frac function in FPC3.1.1 / Windows x86_64
Bart <bartjunk64 at gmail.com <mailto:bartjunk64 at gmail.com> > schrieb am Fr., 27. Apr. 2018, 13:42:
On Wed, Apr 25, 2018 at 11:04 AM, <info at wolfgang-ehrhardt.de <mailto:info at wolfgang-ehrhardt.de> > wrote:
> If you compile and run this 64-bit program on Win 64 you get a crash
And AFAICS your analysis of the cause (see bugtracker) is correct as well.
function fpc_frac_real(d: ValReal) : ValReal;compilerproc; assembler;
nostackframe;
asm
cvttsd2si %xmm0,%rax
{ Windows defines %xmm4 and %xmm5 as first non-parameter volatile registers;
on SYSV systems all are considered as such, so use %xmm4 }
cvtsi2sd %rax,%xmm4
subsd %xmm4,%xmm0
end;
CVTTSD2SI — Convert with Truncation Scalar Double-Precision
Floating-Point Value to Signed Integer
This should not be used to get a ValReal result.
The code essentially does the following (instruction by instruction):
=== code begin ===
tmpi := int64(d - trunc(d));
tmpd := double(tmpi);
Result := d - tmpd;
=== code end ===
Though why it fails with the given value is a different topic...
Regards,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180428/89adf484/attachment.html>
More information about the fpc-devel
mailing list