[fpc-devel] Vectorisation, optimisation etc.
J. Gareth Moreton
gareth at moreton-family.com
Wed Mar 27 15:30:46 CET 2019
So with the false start that was pure inline assembly, I like to talk
about how to move forward with FPC, or at least with x86_64. To Florian
in particular, what is the current state of the compiler's support of the
256-bit YMM registers? I remember you telling me not to enable support of
them when I was implementing vectorcall, since you were still working on
it. I ask because if that was my next big optimisation of x86_64, it
would be vectorisation of things like for-loops and making use of FMA
instructions if allowed, for example.
One thing to note with vectorcall is that it's not just for floating-point
parameters, but it can pass integer-type vectors too. I'll have to
double-check my code, but it may require a bit of metadata for vectorcall
procedures so it uses the correct move command ((V)MOVAPS = single,
(V)MOVAPD = double, (V)MOVDQA = integer) - using the wrong one invokes a
notable performance penalty.
For something more cross-platform, I think pure functions are still on the
books, but like with a few things, including some node-level optimisations,
the XML node dump would need merging so I and others can more easily study
the generated nodes for debugging and development purposes.
By the way, is there any progress about the whole register virtualisation
thing? The part where virtual registers are changed into real ones occurs
before the peephole optimizer, while moving it to after that point will
allow for better register usage, but when I blindly tried it myself I just
got a load of segmentation faults, so it's presumably not that
straightforward, although when programming my prototype Deep Optimizer
(utilises assembly-level data flow analysis), I was able to remove required
registers completely, but because it was after virtual registers were
changed to real ones, said registers were already allocated on the stack.
Gareth aka. Kit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the fpc-devel