[fpc-devel] Potential whole program optimization

J. Gareth Moreton gareth at moreton-family.com
Mon Jul 19 01:24:45 CEST 2021


Hi everyone,

I've been playing around with the peephole optimizer on x86_64 a lot 
lately, and I'm starting to notice that a lot of procedures, both in the 
RTL and the compiler itself, produce the same assembly language when 
fully optimized (or sometimes even before this point).  Just as an 
example in the assembly for TStream in the classes unit:

.section .text.n_classes$_$tstream_$__$$_readdata$char$$nativeint,"ax"
     .balign 16,0x90
.globl    CLASSES$_$TSTREAM_$__$$_READDATA$CHAR$$NATIVEINT
CLASSES$_$TSTREAM_$__$$_READDATA$CHAR$$NATIVEINT:
.seh_proc CLASSES$_$TSTREAM_$__$$_READDATA$CHAR$$NATIVEINT
     leaq    -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
# Peephole Optimization: Mov2Nop 3b done
     movl    $1,%r8d
# Peephole Optimization: %rcx = %rax; removed unnecessary instruction 
(MovMov2MovNop 6b}
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall 
(MovXXX2MovXXX)
     movq    (%rcx),%rax
     call    *256(%rax)
     movslq  %eax,%rax
     nop
     leaq    40(%rsp),%rsp
     ret
.seh_endproc

.section .text.n_classes$_$tstream_$__$$_readdata$shortint$$nativeint,"ax"
     .balign 16,0x90
.globl    CLASSES$_$TSTREAM_$__$$_READDATA$SHORTINT$$NATIVEINT
CLASSES$_$TSTREAM_$__$$_READDATA$SHORTINT$$NATIVEINT:
.seh_proc CLASSES$_$TSTREAM_$__$$_READDATA$SHORTINT$$NATIVEINT
     leaq    -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
# Peephole Optimization: Mov2Nop 3b done
     movl    $1,%r8d
# Peephole Optimization: %rcx = %rax; removed unnecessary instruction 
(MovMov2MovNop 6b}
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall 
(MovXXX2MovXXX)
     movq    (%rcx),%rax
     call    *256(%rax)
     movslq  %eax,%rax
     nop
     leaq    40(%rsp),%rsp
     ret
.seh_endproc

.section .text.n_classes$_$tstream_$__$$_readdata$byte$$nativeint,"ax"
     .balign 16,0x90
.globl    CLASSES$_$TSTREAM_$__$$_READDATA$BYTE$$NATIVEINT
CLASSES$_$TSTREAM_$__$$_READDATA$BYTE$$NATIVEINT:
.seh_proc CLASSES$_$TSTREAM_$__$$_READDATA$BYTE$$NATIVEINT
     leaq    -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
# Peephole Optimization: Mov2Nop 3b done
     movl    $1,%r8d
# Peephole Optimization: %rcx = %rax; removed unnecessary instruction 
(MovMov2MovNop 6b}
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall 
(MovXXX2MovXXX)
     movq    (%rcx),%rax
     call    *256(%rax)
     movslq  %eax,%rax
     nop
     leaq    40(%rsp),%rsp
     ret
.seh_endproc

The final assembly language of each method is identical.

(Note that the trunk is not this efficient just yet... it still leaves a 
"movq %rcx,%rax" instruction prior to "movl $1,%r8d" and then calls 
"movq (%rax),%rax" instead of "movq (%rcx),%rax" - it's still all 
identical though).

Would it be plausible to calculate and store a form of message digest 
(hash) of the final form of the tai entries or machine code and identify 
collisions and potential duplicate procedures for whole-program 
optimization? Granted I don't know anything about WPO yet so I don't 
know how plausible this is.  This wouldn't be somethind done on quick or 
debug builds because you'll need to be able to do proper stack traces, 
and having identical procedures merged into one might cause confusion.

Gareth aka. Kit


-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



More information about the fpc-devel mailing list