[fpc-devel] Division optimisations
J. Gareth Moreton
gareth at moreton-family.com
Sun Sep 12 09:09:45 CEST 2021
Only two changes appear in the compiled RTL, both in SysUtils, but one
is pretty significant. Here's the "IsLeapYear" function under i386-win32:
Trunk:
.section .text.n_sysutils_$$_isleapyear$word$$boolean,"ax"
.balign 16,0x90.globl SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN
SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN:
pushl %ebx
# Peephole Optimization: Mov2Nop 3 done
# Peephole Optimization: %cx = %ax; changed to minimise pipeline stall
(MovXXX2MovXXX)
movzwl %ax,%ecx
# Peephole Optimization: MovAndTest2Test done
testl $3,%ecx
jne .Lj6455
movl %ecx,%ebx
movl $1374389535,%eax
mull %ebx
shrl $5,%edx
imull $100,%edx
subl %edx,%ebx
jne .Lj6456
movl $1374389535,%eax
mull %ecx
shrl $7,%edx
imull $400,%edx
subl %edx,%ecx
jne .Lj6455
.Lj6456:
movb $1,%al
jmp .Lj6460
.Lj6455:
xorb %al,%al
.Lj6460:
popl %ebx
ret
New algorithm:
.section .text.n_sysutils_$$_isleapyear$word$$boolean,"ax"
.balign 16,0x90
.globl SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN
SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN:
movzwl %ax,%edx
andl $3,%edx
jne .Lj6455
imulw $23593,%ax,%dx
rorw $2,%dx
cmpw $655,%dx
ja .Lj6456
imulw $23593,%ax,%ax
rorw $4,%ax
cmpw $163,%ax
jnbe .Lj6455
.Lj6456:
movb $1,%al
ret
.Lj6455:
xorb %al,%al
ret
Besides the significantly smaller code, there's significantly less
register strain, which may open up some new peephole optimisations (e.g.
I could theoretically use the volatile %cx to store the result of "imulw
$23593,%ax,%dx" (via "movw %dx,%cx" right after it, which would execute
simultaneously with "rorw $2,%dx") and then call "mov %cx,%ax" in place
of the second multiplication, after which the peephole optimizer could
have a lot of fun with simplification!
(Note: The reason why the multiplication by 23593 appears twice is
because that number, or $5C29, is the multiplicative inverse of 25
modulo 2^16. The function is checking for "(Year mod 100) = 0" and
"(Year mod 400) <> 0", and 100 = 25 * 2^2 and 400 = 25 * 2^4)
The most optimal code I can theorise so far is:
.section .text.n_sysutils_$$_isleapyear$word$$boolean,"ax"
.balign 16,0x90
.globl SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN
SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN:
testl $3,%ax
jne .Lj6455
imulw $23593,%ax,%ax
rorw $2,%ax
cmpw $655,%ax
ja .Lj6456
rorw $2,%ax
cmpw $163,%ax
setbeb %al
ret
.Lj6456:
movb $1,%al
ret
.Lj6455:
xorb %al,%al
ret
Changing "jnbe .Lj6455" to "setbeb %al; ret" is the easiest to
implement, but it's a start!
Gareth aka. Kit
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list