[fpc-devel] Kit's ambitions!

J. Gareth Moreton gareth at moreton-family.com
Mon Jun 11 22:07:18 CEST 2018

 Thanks David,

 I'm still learning some of the nuances of the Intel and AMD processors,
but most of it is just logical analysis.  Admittedly my main drive has
been to shrink down the size of the binary, since Delphi and Free Pascal
have always been a little bit bloated in comparison.  Not that it is
necessarily a bad thing, but saving space without sacrificing performance
can only be a good thing, especially for those with limited bandwidth or
for saving those few precious bytes when burning files to a CD or DVD.

 There have been a few instances in the compiled compiler (my main test
case) where an entire register is freed up due to my deep optimisation, and
that means the corresponding "push" and "pop" at either end of the
procedure can be removed (along with the corresponding stack unwinding
information), although I haven't started programming that yet.

 I am ready to submit this part of my deep optimiser as a patch.  I'm just
waiting for Florian's acceptance or rejection of my debug strip patch -
https://bugs.freepascal.org/view.php?id=33798 (the 3rd attempt!) - only
because it shares some debugging code with said patch (it was useful to
monitor how the registers inside references were changed).  If it's
rejected, it just means I'll have to change some of that debugging code a

 Gareth aka. Kit 

 On Mon 11/06/18 20:27 , David Pethes public at satd.sk sent:
 nice work. 

 On 8. 6. 2018 0:46, J. Gareth Moreton wrote: 

 > The deep optimiser changes this to: 
 > movq %rcx,%rax 
 > movq %rdx,%rsi 
 > movq %rcx,%rbx 
 > It determines, for the third MOV, it can 
 > change %rax for %rcx to minimise a 
 > pipeline stall, and then knows that %rbx 
 > and %rcx contain the same value, so can 
 > remove the 4th MOV completely. Given that 
 > modern processors usually have at least 3 
 > ALUs and the interdependencies have been 
 > removed, this will likely give a speed 
 > increase of one cycle over these few 
 > commands. 

 Note that modern cpu-s can use move elimination for reg to reg moves, so 
 it doesn't cost any execution resources (it's "free"). Despite that it's 
 still a win, because it spares both bytes in I-cache and decoder 
 bandwidth (which can indirectly lead to some spared cycle(s) at other 

 fpc-devel maillist - fpc-devel at lists.freepascal.org [1] 


[1] mailto:fpc-devel at lists.freepascal.org
[2] http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20180611/806d94cd/attachment.html>

More information about the fpc-devel mailing list