[fpc-devel] Potential whole program optimization
J. Gareth Moreton
gareth at moreton-family.com
Mon Jul 19 01:24:45 CEST 2021
Hi everyone,
I've been playing around with the peephole optimizer on x86_64 a lot
lately, and I'm starting to notice that a lot of procedures, both in the
RTL and the compiler itself, produce the same assembly language when
fully optimized (or sometimes even before this point). Just as an
example in the assembly for TStream in the classes unit:
.section .text.n_classes$_$tstream_$__$$_readdata$char$$nativeint,"ax"
.balign 16,0x90
.globl CLASSES$_$TSTREAM_$__$$_READDATA$CHAR$$NATIVEINT
CLASSES$_$TSTREAM_$__$$_READDATA$CHAR$$NATIVEINT:
.seh_proc CLASSES$_$TSTREAM_$__$$_READDATA$CHAR$$NATIVEINT
leaq -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
# Peephole Optimization: Mov2Nop 3b done
movl $1,%r8d
# Peephole Optimization: %rcx = %rax; removed unnecessary instruction
(MovMov2MovNop 6b}
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall
(MovXXX2MovXXX)
movq (%rcx),%rax
call *256(%rax)
movslq %eax,%rax
nop
leaq 40(%rsp),%rsp
ret
.seh_endproc
.section .text.n_classes$_$tstream_$__$$_readdata$shortint$$nativeint,"ax"
.balign 16,0x90
.globl CLASSES$_$TSTREAM_$__$$_READDATA$SHORTINT$$NATIVEINT
CLASSES$_$TSTREAM_$__$$_READDATA$SHORTINT$$NATIVEINT:
.seh_proc CLASSES$_$TSTREAM_$__$$_READDATA$SHORTINT$$NATIVEINT
leaq -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
# Peephole Optimization: Mov2Nop 3b done
movl $1,%r8d
# Peephole Optimization: %rcx = %rax; removed unnecessary instruction
(MovMov2MovNop 6b}
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall
(MovXXX2MovXXX)
movq (%rcx),%rax
call *256(%rax)
movslq %eax,%rax
nop
leaq 40(%rsp),%rsp
ret
.seh_endproc
.section .text.n_classes$_$tstream_$__$$_readdata$byte$$nativeint,"ax"
.balign 16,0x90
.globl CLASSES$_$TSTREAM_$__$$_READDATA$BYTE$$NATIVEINT
CLASSES$_$TSTREAM_$__$$_READDATA$BYTE$$NATIVEINT:
.seh_proc CLASSES$_$TSTREAM_$__$$_READDATA$BYTE$$NATIVEINT
leaq -40(%rsp),%rsp
.seh_stackalloc 40
.seh_endprologue
# Peephole Optimization: Mov2Nop 3b done
movl $1,%r8d
# Peephole Optimization: %rcx = %rax; removed unnecessary instruction
(MovMov2MovNop 6b}
# Peephole Optimization: %rax = %rcx; changed to minimise pipeline stall
(MovXXX2MovXXX)
movq (%rcx),%rax
call *256(%rax)
movslq %eax,%rax
nop
leaq 40(%rsp),%rsp
ret
.seh_endproc
The final assembly language of each method is identical.
(Note that the trunk is not this efficient just yet... it still leaves a
"movq %rcx,%rax" instruction prior to "movl $1,%r8d" and then calls
"movq (%rax),%rax" instead of "movq (%rcx),%rax" - it's still all
identical though).
Would it be plausible to calculate and store a form of message digest
(hash) of the final form of the tai entries or machine code and identify
collisions and potential duplicate procedures for whole-program
optimization? Granted I don't know anything about WPO yet so I don't
know how plausible this is. This wouldn't be somethind done on quick or
debug builds because you'll need to be able to do proper stack traces,
and having identical procedures merged into one might cause confusion.
Gareth aka. Kit
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list