[fpc-devel] Broken frac function in FPC3.1.1 / Windows x86_64

J. Gareth Moreton gareth at moreton-family.com
Sun Apr 29 04:36:16 CEST 2018


 As an extra point, removing the 'skip' check (i.e. cmp ax, $3FE0, jbe
@@skip) removes 6 bytes from the code size and shaves about 2 to 3
nanoseconds off the execution time in most cases, and it could be argued
that it's worth going for the 'no skip' version because using Frac on a
value of x where |x| < 1 is rather uncommon compared to when |x| >= 1.

 However, when running my timing tests, one thing that's confused me is
that when using very large inputs like 10^300, the function is at least 5
nanoseconds slower than FracSkip2, even though the code is less complex.
This happens even if I put 'align 16' before the @@zero label.

 I did wonder if it being a debug build caused some issues, but when I
compiled it with full optimisation, both versions of the functions ran
slower for numbers of that size (and the original FracDoSkip took about
just as long), and SafeFrac beat them by around 5 nanoseconds.

 Nevertheless, I conclude that for most situations, using the improved
FracNoSkip gives the best performance and size for typical inputs, but this
may depend on an individual machine's architecture.

 ****

 function FracNoSkp2(const X: ValReal): ValReal; assembler; nostackframe;
 asm
   movq      rax,  xmm0
   shr       rax,  48
   and       ax,   $7FF0
   cmp       ax,   $4330
   jge       @@zero
   cvttsd2si rax,  xmm0
   cvtsi2sd  xmm4, rax
   subsd     xmm0, xmm4
   ret
 @@zero:
   xorpd     xmm0, xmm0
 end;

 ****

 Note: 'align 16' at the start of a procedure is usually unnecessary, as
FPC aligns procedures to 16-byte boundaries automatically.  FracNoSkp2 has
a code size of 39 bytes, so will fill 48 bytes (3 blocks), which is a block
smaller than the original FracNoSkip and the current Frac function.

 I've attached my test project to this e-mail if you wish to look at the
figures yourselves (I hope attachments work) and make a more informed
decision.  This will currenly only run on Windows due to the use of
QueryPerformanceCounter for timing checks. These calls will need to be
removed to run this on Linux.

 Gareth aka. Kit

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180429/d5500310/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fractest.lpr
Type: application/octet-stream
Size: 7259 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180429/d5500310/attachment.obj>


More information about the fpc-devel mailing list