[fpc-devel] Detecting SSE and AVX compiler options
J. Gareth Moreton
gareth at moreton-family.com
Mon Feb 4 20:28:40 CET 2019
I might hold on this for a little bit until I get more out of my node
outputting feature, since I need to study the nodes produced by an inlined
Floor function carefully. For example, Floor's formal parameter is
further passed separately into Trunc and Frac - normally it's not a
problem, but if the actual parameter is a complex expression (i.e. isn't a
simple constant or variable), then it may produce even more nodes as it's
calculated twice, once for Trunc and once for Frac... or it's computed
beforehand and put into a temporary store that's hidden from the
programmer. I won't know for sure until I study the nodes and make a good
I'll likely make 3 versions of the floor function (not including the
Pascal version that already exists, which the compiler can fall back on if
it's dealing with the "Extended" type, for example), one that uses SSE2,
one that uses SSE4.1 (which introduces the ROUNDSD instruction) and one
that uses AVX (which is effectively identical to the SSE4.1 one, albeit
using the AVX functions).
The node optimisation is definitely the better choice, thinking about it
now, also because if the compiler determines that the parameters are of
type Single, it can use the single-precision SSE instructions rather than
converting from Single to Double and back again. I just feel like this is
possibly a little bloated because it's the kind of optimisation that
belongs to an internal function rather than one in a supplementary unit...
unless you want to promote "floor" and similar functions from the Math unit
into internal functions through the System unit.
This is proving to be a fascinating learning experience, not just of
coding but also of design and discussion!
Gareth aka. Kit
On Mon 04/02/19 20:04 , "Florian Klämpfl" florian at freepascal.org sent:
Am 04.02.19 um 17:47 schrieb J. Gareth Moreton:
> Oh whoops, sorry about that and not replying to the list.
> I'll try not to screw up. Generally I think Double is preferred
> then everything uses SSE2 and no awkward ferrying of data between it and
> the floating-point stack is required (come to think of it, only Win64
> actually requires the presence of SSE2 and refuses to install if it's
> not present).
> Given that Florian prefers a node micro-optimisation for functions like
> floor, it should be easy enough to check if the input is of type Single
> or Double, and drop out if it's Extended (falling back to the actual
> source code).
Well, in case of a node optimization in combination with inline I do not
see it as a real micro optimization as it results in the best code which
is not the case if it is ifdef'ed assembler code in a unit which is most
of the time not used (fpc x86-64 rtl is build with -Cfsse2 normally).
fpc-devel maillist - fpc-devel at lists.freepascal.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the fpc-devel