[fpc-devel] Peephole optimizer passes

Tue Jan 25 21:14:26 CET 2022

Hi everyone,

So I've found with the peephole optimizer, at least on x86, that if you 
run pass 2 more than once, it often catches even more optimisations that 
otherwise get missed.  At the same time I've found some bugs that get 
triggered when pass 2 is run again (which is why I asked about 
RegLoadedWithNewValue in another chain).

I'm working out how best to permit this, given pass 2 has only ever been 
run once, and it's a cross-platform thing that will cause slowdown 
across the board, although I figure if it only runs pass 2 multiple 
times on -O3 and above, then it running more slowly is permissible.

Additionally, I've found that running certain elements of pass 1 again 
also yield some new optimisations, although in this instance I figure 
it's best to just run these optimisations again in pass 2 instead of 
falling back to pass 1, although I'll have to experiment to see if this 
catches all eventualities;

On another note, I do wonder if the pre-peephole pass should be merged 
into pass 1, and then pass 1 be run up to 3 times on -O2 instead of 
twice so the level of optimisation is identical.  Then again, I'm not 
certain if other platforms do some special instruction manipulation that 
would be incompatible with a regular pass.

Gareth aka. Kit

P.S. Just some examples... in ninl, for example - before:

.Lj1162:
     movq    %r13,%rcx
     call    NCON_$$_GENENUMNODE$TENUMSYM$$TORDCONSTNODE
     movq    %rax,56(%rsp)
     movq    56(%rsp),%rdi
     jmp    .Lj1141
     .balign 16,0x90

After:

.Lj1162:
     movq    %r13,%rcx
     call    NCON_$$_GENENUMNODE$TENUMSYM$$TORDCONSTNODE
     movq    %rax,56(%rsp)
     movq    %rax,%rdi
     jmp    .Lj1141
     .balign 16,0x90

In SysUtils, this sequence appears surprisingly often on x86_64-win64:

.Lj7572:
     movq    -40(%rbp),%rax
     cmpb    $0,-292(%rax)
     jne    .Lj7577
     movq    -40(%rbp),%rcx
     movl    $1,%r8d
     movq    -48(%rbp),%rdx
     call SYSUTILS$_$DATETIMETOSTRING$hxuwovHuJEHC_$$_STORESTR$PCHAR$LONGINT
     movb    %sil,%dil
     movb    -4(%rbp),%sil
     movb    %sil,%dil
     movb    -4(%rbp),%sil
     jmp    .Lj7447
     .p2align 4,,10
     .p2align 3

And this is optimised by additional passes and optimisations:

.Lj7572:
     movq    -40(%rbp),%rax
     cmpb    $0,-292(%rax)
     jne    .Lj7577
     movq    -40(%rbp),%rcx
     movl    $1,%r8d
     movq    -48(%rbp),%rdx
     call SYSUTILS$_$DATETIMETOSTRING$hxuwovHuJEHC_$$_STORESTR$PCHAR$LONGINT
     movb    -4(%rbp),%dil
     movb    %dil,%sil
     jmp    .Lj7447
     .p2align 4,,10
     .p2align 3

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus