[fpc-devel] Detecting SSE and AVX compiler options

J. Gareth Moreton gareth at moreton-family.com
Sun Feb 3 17:27:03 CET 2019

 It's certainly possible, but feels a little finnicky, since floor64 is not
an internal function unlike, say, the trigonometric functions.  It will
also break if the original code is changed.  It feels like a kludge,
especially if another programmer down the line tries to rewrite the
function and is suddenly confused when the execution speed turns out slower
because the node pattern is no longer identical.

  The intention though was to put the improved code, with pre-processor
directives to detect the FPU switches, in the platform-specific include
file and wrap the original procedure in a "{$ifndef FPC_MATH_HAS_FLOOR64}",
similar to how other functions in the Math unit are programmed (e.g.

 To reassure, I'm aware that "float" is normally "extended" outside of
x86_64, and I would keep my changes constrained to that platform.

 Regarding Trunc, I'm aware that it's just "cvttsd2si %xmm0,%rax", but
being assembly language, it's currently impossible to inline. Admittedly
this is something I would like to develop and implement at some point, the
ability to inline at least simple assembler routines where temporary
registers can be replaced with virtual registers and the compiler can
detect registers that map onto parameters and return values - very
platform-specific though, but since "inline" is just ignored if it can't be
used, it won't be an erroneous situation.

 Gareth aka. Kit
 P.S. Documentation specifically states that the Floor function round
towards negative infinity, unlike Trunc that rounds towards zero.

 On Sun 03/02/19 13:11 , Florian Klämpfl florian at freepascal.org sent:
 Am 03.02.19 um 06:26 schrieb J. Gareth Moreton: 
 > Hi everyone, 
 > So I'm looking to improve some of the mathematical routines.  However, 
 > not all of them are internal functions and are stored in the Math 
 > unit..  Some of them are written in assembly language but use the old 
 > floating-point stack, or use a slow hack when there's a good alternative

 > available in SSE 4.1, for example, and I would like to see about 
 > rewriting some of these functions for x86_64.  However, while I can 
 > safely assume the presence of SSE2 on this architecture, what's the best

 > way to detect if "-iCOREAVX" etc are specified?  Also, if "-iCOREAVX", 
 > does it automatically set "-fAVX" as well?  I rather make sure I'm not 
 > making incorrect assumptions before I start writing assembly language 
 > routines. 
 > As an example of a function that can benefit from a speed-up under 
 > x86_64... the floor() and floor64() functions: 
 > function floor64(x: float): Int64; 
 >   begin 
 >     Result:=Trunc(x)-ord(Frac(x)   end; 
 > For time-critical code, this is not ideal because, besides being a 
 > function itself, it calls Trunc, Frac, has a subtraction, and another 
 > implicit subtraction and assignment due to the condition.  Under
 > this could be optimised to something like the following: 

 Better make it inline, detect the node pattern and then generate the 
 right instructions depending on the fpu switches. While this is still a 
 "micro" optimization, it has its maximum benefit and does not clutter 
 rtl units with assembler and user code using similar sequences benefit 
 from it as well. 
 fpc-devel maillist - fpc-devel at lists.freepascal.org [1] 


[1] mailto:fpc-devel at lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20190203/af831c21/attachment.html>

More information about the fpc-devel mailing list