[fpc-devel] Detecting SSE and AVX compiler options

Florian Klämpfl florian at freepascal.org
Sun Feb 3 14:11:01 CET 2019


Am 03.02.19 um 06:26 schrieb J. Gareth Moreton:
> Hi everyone,
> 
> So I'm looking to improve some of the mathematical routines.  However, 
> not all of them are internal functions and are stored in the Math 
> unit..  Some of them are written in assembly language but use the old 
> floating-point stack, or use a slow hack when there's a good alternative 
> available in SSE 4.1, for example, and I would like to see about 
> rewriting some of these functions for x86_64.  However, while I can 
> safely assume the presence of SSE2 on this architecture, what's the best 
> way to detect if "-iCOREAVX" etc are specified?  Also, if "-iCOREAVX", 
> does it automatically set "-fAVX" as well?  I rather make sure I'm not 
> making incorrect assumptions before I start writing assembly language 
> routines.
> 
> As an example of a function that can benefit from a speed-up under 
> x86_64... the floor() and floor64() functions:
> 
> function floor64(x: float): Int64;
>    begin
>      Result:=Trunc(x)-ord(Frac(x)<0);
>    end;
> 
> For time-critical code, this is not ideal because, besides being a 
> function itself, it calls Trunc, Frac, has a subtraction, and another 
> implicit subtraction and assignment due to the condition.  Under SSE4.1, 
> this could be optimised to something like the following:

Better make it inline, detect the node pattern and then generate the 
right instructions depending on the fpu switches. While this is still a 
"micro" optimization, it has its maximum benefit and does not clutter 
rtl units with assembler and user code using similar sequences benefit 
from it as well.



More information about the fpc-devel mailing list