[fpc-devel] Division optimisations

J. Gareth Moreton gareth at moreton-family.com
Sun Sep 12 09:09:45 CEST 2021


Only two changes appear in the compiled RTL, both in SysUtils, but one 
is pretty significant.  Here's the "IsLeapYear" function under i386-win32:

Trunk:

.section .text.n_sysutils_$$_isleapyear$word$$boolean,"ax"
     .balign 16,0x90.globl    SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN
SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN:
     pushl    %ebx
# Peephole Optimization: Mov2Nop 3 done
# Peephole Optimization: %cx = %ax; changed to minimise pipeline stall 
(MovXXX2MovXXX)
     movzwl    %ax,%ecx
# Peephole Optimization: MovAndTest2Test done
     testl    $3,%ecx
     jne    .Lj6455
     movl    %ecx,%ebx
     movl    $1374389535,%eax
     mull    %ebx
     shrl    $5,%edx
     imull    $100,%edx
     subl    %edx,%ebx
     jne    .Lj6456
     movl    $1374389535,%eax
     mull    %ecx
     shrl    $7,%edx
     imull    $400,%edx
     subl    %edx,%ecx
     jne    .Lj6455
.Lj6456:
     movb    $1,%al
     jmp    .Lj6460
.Lj6455:
     xorb    %al,%al
.Lj6460:
     popl    %ebx
     ret

New algorithm:

.section .text.n_sysutils_$$_isleapyear$word$$boolean,"ax"
     .balign 16,0x90
.globl    SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN
SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN:
     movzwl    %ax,%edx
     andl    $3,%edx
     jne    .Lj6455
     imulw    $23593,%ax,%dx
     rorw    $2,%dx
     cmpw    $655,%dx
     ja    .Lj6456
     imulw    $23593,%ax,%ax
     rorw    $4,%ax
     cmpw    $163,%ax
     jnbe    .Lj6455
.Lj6456:
     movb    $1,%al
     ret
.Lj6455:
     xorb    %al,%al
     ret

Besides the significantly smaller code, there's significantly less 
register strain, which may open up some new peephole optimisations (e.g. 
I could theoretically use the volatile %cx to store the result of "imulw 
$23593,%ax,%dx" (via "movw %dx,%cx" right after it, which would execute 
simultaneously with "rorw $2,%dx") and then call "mov %cx,%ax" in place 
of the second multiplication, after which the peephole optimizer could 
have a lot of fun with simplification!

(Note: The reason why the multiplication by 23593 appears twice is 
because that number, or $5C29, is the multiplicative inverse of 25 
modulo 2^16. The function is checking for "(Year mod 100) = 0" and 
"(Year mod 400) <> 0", and 100 = 25 * 2^2 and 400 = 25 * 2^4)

The most optimal code I can theorise so far is:

.section .text.n_sysutils_$$_isleapyear$word$$boolean,"ax"
     .balign 16,0x90
.globl    SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN
SYSUTILS_$$_ISLEAPYEAR$WORD$$BOOLEAN:
     testl   $3,%ax
     jne    .Lj6455
     imulw    $23593,%ax,%ax
     rorw    $2,%ax
     cmpw    $655,%ax
     ja    .Lj6456
     rorw    $2,%ax
     cmpw    $163,%ax
     setbeb  %al
     ret
.Lj6456:
     movb    $1,%al
     ret
.Lj6455:
     xorb    %al,%al
     ret

Changing "jnbe .Lj6455" to "setbeb %al; ret" is the easiest to 
implement, but it's a start!

Gareth aka. Kit


-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



More information about the fpc-devel mailing list