[fpc-devel] Peephole optimizer passes
J. Gareth Moreton
gareth at moreton-family.com
Tue Jan 25 21:14:26 CET 2022
Hi everyone,
So I've found with the peephole optimizer, at least on x86, that if you
run pass 2 more than once, it often catches even more optimisations that
otherwise get missed. At the same time I've found some bugs that get
triggered when pass 2 is run again (which is why I asked about
RegLoadedWithNewValue in another chain).
I'm working out how best to permit this, given pass 2 has only ever been
run once, and it's a cross-platform thing that will cause slowdown
across the board, although I figure if it only runs pass 2 multiple
times on -O3 and above, then it running more slowly is permissible.
Additionally, I've found that running certain elements of pass 1 again
also yield some new optimisations, although in this instance I figure
it's best to just run these optimisations again in pass 2 instead of
falling back to pass 1, although I'll have to experiment to see if this
catches all eventualities;
On another note, I do wonder if the pre-peephole pass should be merged
into pass 1, and then pass 1 be run up to 3 times on -O2 instead of
twice so the level of optimisation is identical. Then again, I'm not
certain if other platforms do some special instruction manipulation that
would be incompatible with a regular pass.
Gareth aka. Kit
P.S. Just some examples... in ninl, for example - before:
.Lj1162:
movq %r13,%rcx
call NCON_$$_GENENUMNODE$TENUMSYM$$TORDCONSTNODE
movq %rax,56(%rsp)
movq 56(%rsp),%rdi
jmp .Lj1141
.balign 16,0x90
After:
.Lj1162:
movq %r13,%rcx
call NCON_$$_GENENUMNODE$TENUMSYM$$TORDCONSTNODE
movq %rax,56(%rsp)
movq %rax,%rdi
jmp .Lj1141
.balign 16,0x90
In SysUtils, this sequence appears surprisingly often on x86_64-win64:
.Lj7572:
movq -40(%rbp),%rax
cmpb $0,-292(%rax)
jne .Lj7577
movq -40(%rbp),%rcx
movl $1,%r8d
movq -48(%rbp),%rdx
call SYSUTILS$_$DATETIMETOSTRING$hxuwovHuJEHC_$$_STORESTR$PCHAR$LONGINT
movb %sil,%dil
movb -4(%rbp),%sil
movb %sil,%dil
movb -4(%rbp),%sil
jmp .Lj7447
.p2align 4,,10
.p2align 3
And this is optimised by additional passes and optimisations:
.Lj7572:
movq -40(%rbp),%rax
cmpb $0,-292(%rax)
jne .Lj7577
movq -40(%rbp),%rcx
movl $1,%r8d
movq -48(%rbp),%rdx
call SYSUTILS$_$DATETIMETOSTRING$hxuwovHuJEHC_$$_STORESTR$PCHAR$LONGINT
movb -4(%rbp),%dil
movb %dil,%sil
jmp .Lj7447
.p2align 4,,10
.p2align 3
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list