[fpc-devel] Kit's ambitions!
J. Gareth Moreton
gareth at moreton-family.com
Fri Jun 8 00:46:00 CEST 2018
So a progress update.
I've tied in part of my deep optimiser
into the peephole optimiser, specifically
PostPeepholeOptMov, and it's had some
unexpected benefits. One of the things it
does is start with a MOV command that
copies a register's contents into another,
then looks at subsequent reference
addresses to see if it can swap out one
register for another, to reduce the chance
of a pipeline stall. There are cases where
it's noticed that all such registers have
been switched in a certain block and hence
safely removes the original MOV command.
What this means is that as well as
reducing the chances of a pipeline stall,
it's removing unnecessary assignments.
My main test case has been compiling the
compiler, since it's sufficiently complex
and easy to crash if incorrect machine
code is produced, and it also gives plenty
of examples of optimisation. As a very
brief example, in
compiler/x86_64/symcpu.pas in
TCPUProcDef.ppuload_platform, the first
four lines are:
movq %rcx,%rax
movq %rdx,%rsi
movq %rax,%rbx
movq %rbx,%rcx
The deep optimiser changes this to:
movq %rcx,%rax
movq %rdx,%rsi
movq %rcx,%rbx
It determines, for the third MOV, it can
change %rax for %rcx to minimise a
pipeline stall, and then knows that %rbx
and %rcx contain the same value, so can
remove the 4th MOV completely. Given that
modern processors usually have at least 3
ALUs and the interdependencies have been
removed, this will likely give a speed
increase of one cycle over these few
commands.
Before I go submitting patches though, I
still need to test it under Linux and
i386.
Kit
More information about the fpc-devel
mailing list