[fpc-devel] Detecting SSE and AVX compiler options
Florian Klämpfl
florian at freepascal.org
Sun Feb 3 14:11:01 CET 2019
Am 03.02.19 um 06:26 schrieb J. Gareth Moreton:
> Hi everyone,
>
> So I'm looking to improve some of the mathematical routines. However,
> not all of them are internal functions and are stored in the Math
> unit.. Some of them are written in assembly language but use the old
> floating-point stack, or use a slow hack when there's a good alternative
> available in SSE 4.1, for example, and I would like to see about
> rewriting some of these functions for x86_64. However, while I can
> safely assume the presence of SSE2 on this architecture, what's the best
> way to detect if "-iCOREAVX" etc are specified? Also, if "-iCOREAVX",
> does it automatically set "-fAVX" as well? I rather make sure I'm not
> making incorrect assumptions before I start writing assembly language
> routines.
>
> As an example of a function that can benefit from a speed-up under
> x86_64... the floor() and floor64() functions:
>
> function floor64(x: float): Int64;
> begin
> Result:=Trunc(x)-ord(Frac(x)<0);
> end;
>
> For time-critical code, this is not ideal because, besides being a
> function itself, it calls Trunc, Frac, has a subtraction, and another
> implicit subtraction and assignment due to the condition. Under SSE4.1,
> this could be optimised to something like the following:
Better make it inline, detect the node pattern and then generate the
right instructions depending on the fpu switches. While this is still a
"micro" optimization, it has its maximum benefit and does not clutter
rtl units with assembler and user code using similar sequences benefit
from it as well.
More information about the fpc-devel
mailing list