[fpc-devel] Detecting SSE and AVX compiler options
J. Gareth Moreton
gareth at moreton-family.com
Mon Feb 4 17:47:29 CET 2019
Oh whoops, sorry about that and not replying to the list.
I'll try not to screw up. Generally I think Double is preferred because
then everything uses SSE2 and no awkward ferrying of data between it and
the floating-point stack is required (come to think of it, only Win64
actually requires the presence of SSE2 and refuses to install if it's not
present).
Given that Florian prefers a node micro-optimisation for functions like
floor, it should be easy enough to check if the input is of type Single or
Double, and drop out if it's Extended (falling back to the actual source
code).
I've managed to program a cosine function that is about 8 times faster
than FCOS (FCOS is still faster if the input is integral, but other than 0,
this is unrealistic since the angle is in radians), but it uses SSE2 rather
then x87. It could probably be ported to x87, but will be slower on
account that more terms in the Maclaurin Series of cosine will be required
for the precision of Extended, among other things (by adjusting the input
to be between 0 and pi, I only need 11 terms for Double, and using Horner's
Method to factorise the series, the calculation becomes relatively simple,
since each step involves subtracting the working value from a reciprocal
factorial (a constant) then multiplying by x², something that is easily
precomputed ). Also, since I'm using SSE2 and each XMM register can hold
2 Doubles, I can easily adapt the function to compute the sine at the same
time (i.e. the sincos function, or a verson in a custom library that takes
advantage of "vectorcall" and the System V ABI by having both return values
in XMM0).
I figure that offering speed boosts for mathematical functions are a
worthwhile investment, especially in writing a game engine, or example
(which is my ambition!). sin and cos are certainly easier because they're
intrinsics pointing to internal functions, whereas floor is an actual
function in a supplementary unit.
Gareth aka. Kit
On Mon 04/02/19 13:19 , "Sven Barth" pascaldragon at googlemail.com sent:
Am Mo., 4. Feb. 2019, 14:15 hat J. Gareth Moreton geschrieben:
Oh right, okay, so x86_64-win64 is Double (even though Extended is
supported), but other x86_64 platforms are Extended, right? A little bit
odd, but I'll keep an eye out in that case.
Correct. Though Extended is not considered supported on Win64 by Microsoft
themselves and they warn against usage if the x87 (you need to explicitly
compile FPC with a specific define to enable Extended on Win64).
Regards, Sven
PS: you didn't answer to the list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20190204/1cb79c64/attachment.html>
More information about the fpc-devel
mailing list