[fpc-devel] Optimization theory

Sat Jun 16 23:21:44 CEST 2018

 Note that I speak mostly from an x86_64 perspective, since this is where I
have almost universal exposure.

 So I've been pondering a few things after researching Florian's prototype
patch for optimisations done prior to register allocation, when the
pre-compiled assembly language utilises imaginary (virtual) registers
pretty much everywhere other than where distinct registers are required
(e.g. function parameters).  My question is... how much can be moved to
the pre-allocation stage?  I believe it can make the outcome far more
efficient, especially with my initial work on the deep optimizer as a few
people have stated already.  Unless I'm mistaken, from what I've observed,
a lot of the peephole optimizations don't actually care what registers
they're playing with.

 There are also situations in optimization where one might detect a repeat
calculation that the programmer cannot eliminate themselves, but which the
best optimization is to store the result and re-use it later... the most
obvious situation where this arises is with div and mod with the same
numerator and denominator.  Currently, the compiler doesn't know any
better and has to calculate the division twice, a relatively expensive
operation, even though DIV returns both the quotient and the remainder in
RAX and RDX respectively.  I believe storing the mod result in a virtual
register for later use will be far easier to manage when the registers have
not yet been allocated, especially if it's determined that a new register
has to be preserved in the function prologue, or there are no free
registers at all and it has to be put on the stack, or if it's at all
possible to use RDX itself as that temporary storage (the most ideal
outcome in both speed and size), something that would be near impossible if
RDX has been allocated for something in between.

 Speaking of the deep optimizer, I do have a patch ready to submit once the
backlog of other patches are analysed (it relies on a few of them), and
I'll be working on porting elements of it to the pre-allocation stage. 
Some bits are a little difficult because of how imaginary registers are
tracked... if you delete an instruction, you have to be careful you don't
leave a dangling pointer in the "live range".

 This is just thinking out loud and pondering.
 Gareth aka. Kit

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180616/82bf50ab/attachment.html>