[fpc-devel] Prototype optimisation... Sliding Window
J. Gareth Moreton
gareth at moreton-family.com
Fri Mar 11 14:33:49 CET 2022
So far I haven't managed to do very accurate timing measurements - I'll
try to resolve that and get more accurate figures, especially as I'm
still writing up the whitepaper explaining everything before I make a
merge request. Compilation time I'm not hugely concerned about since
this sliding window feature is only enabled on -O3 and higher, but the
faster I can get it, the better, especially as this family of Pascal
compilers pride themselves on compilation speed.
Gareth aka. Kit
On 11/03/2022 12:43, Flávio Etrusco via fpc-devel wrote:
> Kudos! In the end, did it make much of a difference in the compilation time?
>
> Em sex., 25 de fev. de 2022 às 02:08, J. Gareth Moreton via fpc-devel
> <fpc-devel at lists.freepascal.org> escreveu:
>> I did it!
>>
>> After a good week of work and getting things wrong, I finally found a
>> solution that works nicely and is extensible, at least for x86. A bit
>> of refactoring and it can be ported to other platforms. I'm just
>> running the test suites to see if I can break things now. Honestly the
>> hard part was making sure all the registers were tracked properly.
>>
>> https://gitlab.com/CuriousKit/optimisations/-/commits/sliding-window
>>
>> Please feel free to test it and break it!
>>
>> It will likely undergo adjustments over time to refactor bits and
>> pieces, add extra instructions (especially as it doesn't support SHRX
>> and SHLX yet, for example) and see if I can make the sliding window more
>> efficient - I increased it in size from 8 to 16 and then 32 entries,
>> since even at 16, some optimisations were missed in the RTL, but this
>> depends on a number of factors.
>>
>> It's gotten some pretty good optimisations. On x86_64-win64 under -O4,
>> the three lazarus binaries... before:
>>
>> lazarus.exe:
>>
>> 313,990,206
>> 313,873,982
>>
>> lazbuild.exe
>>
>> 59,704,391
>> 59,725,895
>>
>> startlazarus.exe
>>
>> 27,471,147
>> 27,461,419
>>
>> And the compiler itself, ppcx64.exe:
>>
>> 3,777,536
>> 3,766,272
>>
>> Remember that though I call the component a "sliding window", it's only
>> because it's very similar to how LZ77 finds start points for run-length
>> encoding, and the comments and variable names mention run-length
>> encoding as it scans sequences of identical instructions. However, at
>> no point is it actually performing data compression, and the end-result
>> is common subexpression elimination. All the space savings are from the
>> removal of redundant sequences of instructions, giving both a space and
>> a speed boost.
>>
>> Almost every source file in the compiler and the RTL shows some kind of
>> improvement. A lot of them are just redundant pointer deallocations, so
>> this will help with cache misses and the like - that aside though, here
>> are a couple of my favourites... one from dbgdwarf -
>> TDebugInfoDwarf.appendsym_fieldvar_with_name_offset:
>>
>> Before:
>>
>> ...
>> .Lj682:
>> leaq (,%r13,8),%rcx
>> movq 120(%rdi),%rax
>> cqto
>> idivq %rcx
>> imulq %r13,%rax
>> movq %rax,%r12
>> addq 56(%rbp),%r12
>> leaq (,%r13,8),%rcx
>> movq 120(%rdi),%rax
>> cqto
>> idivq %rcx
>> movq %rdx,%rsi
>> cmpb $0,U_$SYSTEMS_$$_TARGET_INFO+276(%rip)
>>
>> After:
>>
>> ...
>> .Lj682:
>> leaq (,%r13,8),%rcx
>> movq 120(%rdi),%rax
>> cqto
>> idivq %rcx
>> imulq %r13,%rax
>> movq %rax,%r12
>> addq 56(%rbp),%r12
>> movq %rdx,%rsi
>> cmpb $0,U_$SYSTEMS_$$_TARGET_INFO+276(%rip)
>>
>> This one has successfully removed an IDIV instruction because the
>> algorithm was able to detect that the subroutine wanted the remainder in
>> %edx, and it was still available from the first IDIV call because it
>> hadn't been overwritten, and neither %r13 nor %rdi had changed values so
>> the references are the same.
>>
>> This one is from SysUtils' IsLeapYear function and is one I have
>> personally wanted to optimise further ever since I first saw its
>> disassembly after I implemented the fast "x mod const = 0" algorithm.
>>
>> Before:
>>
>> ...
>> imulw $23593,%cx,%ax
>> rorw $2,%ax
>> cmpw $655,%ax
>> ja .Lj6979
>> imulw $23593,%cx,%ax
>> rorw $4,%ax
>> cmpw $163,%ax
>> setbeb %al
>> ret
>> .Lj6979:
>> ...
>>
>> After:
>>
>> ...
>> imulw $23593,%cx,%ax
>> rorw $2,%ax
>> cmpw $655,%ax
>> ja .Lj6979
>> rorw $2,%ax
>> cmpw $163,%ax
>> setbeb %al
>> ret
>> .Lj6979:
>> ...
>>
>> In this case, the RLE doesn't produce an exact match since the end
>> result of %ax is different, but the important point to note is that as
>> long as %cx doesn't change value (which it doesn't). the first sequence
>> can be transformed into the second sequence via a "rorw $2,%ax"
>> instruction, so the end result is that "imulw $23593,%cx,%ax; rorw
>> $4,%ax" is transmuted into "rorw $2,%ax" based on the previous value of
>> %ax, thus removing a multiplication.
>>
>> Because this has been a mammoth undertaking and is quite a major
>> addition to the compiler, I'm going to hold off making a merge request
>> until I write a document on it, like I've done in the past with some of
>> the other larger optimisations I've made. The branch is available at
>> the link at the top of this e-mail though if anyone wants to take a look
>> at it.
>>
>> One final note is that this optimisation is rather slow and specialist,
>> so much so that I've added a new optimizer switch named 'cs_opt_asmcse'
>> ("Assembly CSE", to differentiate it from "Node CSE"); this is enabled
>> by default with -O3, and requires the peephole optimizer to also be
>> enabled in order to function.
>>
>> Gareth aka. Kit
>>
>>
>> --
>> This email has been checked for viruses by Avast antivirus software.
>> https://www.avast.com/antivirus
>>
>> _______________________________________________
>> fpc-devel maillist - fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list