[fpc-devel] Optimization theory

J. Gareth Moreton gareth at moreton-family.com
Sun Jun 17 12:36:38 CEST 2018

 That's where the first stage of my deep optimizer might be able to help,
since it explicitly starts at a MOV command (say, mov %reg_source,
%reg_dest) and scans forward to see if %reg_dest can be replaced with
%reg_source - it stops when it hits a jump, call, non-skippable label, when
the registers change value or if %reg_dest cannot be replaced.  If all
references to %reg_dest were replaced prior to its value being completely
overwritten, then it can then delete the original MOV.

 While it will only be appropriate to run at -O2 and -O3, the restrictions
surrounding the register replacement, and specially where it stops
searching, help ensure that it runs relatively quickly.
 Find attached a test patch of the deep optimizer - it was intertwined with
a number of other peephole changes, so I hope these were removed
successfully and won't cause a compiler error.  (This is what I intend to
submit to the bug reporter as a patch eventually)  Running the compiler
with DEBUG_AOPTCPU enabled will show where the deep optimizer has made
savings in the .s files.

 Gareth aka. Kit

 On Sun 17/06/18 09:56 , Florian Klämpfl florian at freepascal.org sent:
 Am 16.06.2018 um 23:21 schrieb J. Gareth Moreton: 
 > Note that I speak mostly from an x86_64 perspective, since this is where
I have almost universal exposure. 
 > So I've been pondering a few things after researching Florian's
prototype patch for optimisations done prior to register 
 > allocation, when the pre-compiled assembly language utilises imaginary
(virtual) registers pretty much everywhere other 
 > than where distinct registers are required (e.g. function parameters). 
My question is... how much can be moved to the 
 > pre-allocation stage? 

 A lot, basically everything which reduced register pressure. The only
problem is, at this stage, the code contains a lot 
 of moves (compile with -sr to see how it looks like). So the optimizer
must be able to handle this. It might be even 
 possible to build a generic optimizer pass at this stage. Example: 

 A typical sequence FPC often generates is: 

 mov %src1,%dest1 
 add %dest1,%src2,%dest2 

 If src1 is no released after mov but dest1 is release, src1 and dest1
still cannot be coalesced as they interfere, so an 
 extra register is allocated. The move will be remove by the peephole
optimizer, but register was allocated and increase 
 register pressure. Such optimizations could be done generic (for all
CPUs): if the destination of a mov is only read 
 afterwards (this information is already generically available), the mov
can be removed and in this case dest1 can be 
 replaced by src1. 
 fpc-devel maillist - fpc-devel at lists.freepascal.org [1] 


[1] mailto:fpc-devel at lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180617/0baaf015/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: DeepOpt_MOV.patch
Type: application/octet-stream
Size: 56583 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180617/0baaf015/attachment.obj>

More information about the fpc-devel mailing list