[fpc-devel] Kit's ambitions!

J. Gareth Moreton gareth at moreton-family.com
Fri Jun 8 00:46:00 CEST 2018

So a progress update.

I've tied in part of my deep optimiser 
into the peephole optimiser, specifically 
PostPeepholeOptMov, and it's had some 
unexpected benefits. One of the things it 
does is start with a MOV command that 
copies a register's contents into another, 
then looks at subsequent reference 
addresses to see if it can swap out one 
register for another, to reduce the chance 
of a pipeline stall. There are cases where 
it's noticed that all such registers have 
been switched in a certain block and hence 
safely removes the original MOV command.

What this means is that as well as 
reducing the chances of a pipeline stall, 
it's removing unnecessary assignments.

My main test case has been compiling the 
compiler, since it's sufficiently complex 
and easy to crash if incorrect machine 
code is produced, and it also gives plenty 
of examples of optimisation. As a very 
brief example, in 
compiler/x86_64/symcpu.pas in 
TCPUProcDef.ppuload_platform, the first 
four lines are:

movq %rcx,%rax
movq %rdx,%rsi
movq %rax,%rbx
movq %rbx,%rcx

The deep optimiser changes this to:

movq %rcx,%rax
movq %rdx,%rsi
movq %rcx,%rbx

It determines, for the third MOV, it can 
change %rax for %rcx to minimise a 
pipeline stall, and then knows that %rbx 
and %rcx contain the same value, so can 
remove the 4th MOV completely. Given that 
modern processors usually have at least 3 
ALUs and the interdependencies have been 
removed, this will likely give a speed 
increase of one cycle over these few 

Before I go submitting patches though, I 
still need to test it under Linux and 


More information about the fpc-devel mailing list