[fpc-devel] Prototype optimisation... Sliding Window

Fri Feb 25 15:19:17 CET 2022

On 25/02/2022 08:29, Marco Borsari via fpc-devel wrote:
> This is very useful, thank you.
> I think FPC has an excellent register allocator, but frustrated on 32 bit
> by scarce resources and by the lack of reloading check.

Unfortunately the equivalent procedure isn't optimised on i386-win32:

.Lj679:
     movl    %eax,%edx
.Lj680:
     movl    %edx,-832(%ebp)
     leal    (,%edx,8),%ecx
     movl    -824(%ebp),%edx
     movl    76(%edx),%eax
     cltd
     idivl    %ecx
     imull    -832(%ebp),%eax
     movl    %eax,-828(%ebp)
     addl    8(%ebp),%eax
     movl    %eax,-828(%ebp)
     movl    -832(%ebp),%eax
     leal    (,%eax,8),%ecx
     movl    -824(%ebp),%edx
     movl    76(%edx),%eax
     cltd
     idivl    %ecx
     movl    %edx,%esi

The compiler has no way of knowing that -832(%ebp) contains the value of 
%edx at the start and hence loaded into %eax (which is used for the 
initial address instead of %edx, although the optimisation would still 
fail even if they used the same registers) in the repeated sequence.  A 
lot of these optimisations may require a means of adding 'hints' to the 
assembly language list to indicate the state of things.

A more minor example in the same unit (dbgdwarf):

     movl    %eax,%esi
     movl    60(%eax),%edx
     movl    -564(%ebp),%eax
     cmpl    72(%eax),%edx
     jl    .Lj359
     movl    60(%esi),%edx
     movl    -564(%ebp),%eax
     cmpl    76(%eax),%edx

This only gets optimised to:

     movl    %eax,%esi
     movl    60(%eax),%edx
     movl    -564(%ebp),%eax
     cmpl    72(%eax),%edx
     jl    .Lj359
     movl    60(%esi),%edx
     cmpl    76(%eax),%edx

This is because the peephole optimiser changes %esi to %eax in the "movl 
60(%eax),%edx" instruction on account that it will minimise a pipeline 
stall (it doesn't have to wait for %esi to get loaded when %eax is 
definitely loaded).  If there was a means of leaving a hint that %esi = 
%eax at that point, then it might be possible to better optimise it to 
the ideal:

     movl    %eax,%esi
     movl    60(%eax),%edx
     movl    -564(%ebp),%eax
     cmpl    72(%eax),%edx
     jl    .Lj359
     cmpl    76(%eax),%edx

This is what my proposed feature over at 
https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/74 is 
meant to help with (the showcase uses the "extra optimisation 
information" to store information on the state of the upper 32 bits of 
registers in x86_64, so it can make deeper optimisations knowing whether 
it's set to zero or not).

Some other things might need some deeper thought:

     movl    -16(%ebp),%edx
     movl    (%edx),%eax
     movl    20(%eax),%eax
     movl    20(%eax),%eax
     movzbl    169(%eax),%eax
     pushl    %eax
     movl    -16(%ebp),%edx
     movl    (%edx),%eax

For some reason, the second "movl -16(%ebp),%edx" isn't removed. I'm not 
sure yet whether this is because the sliding window is too small (the 
first one gets removed due to another "movl -16(%ebp),%edx" that appears 
earlier, so this entry does NOT appear in the sliding window, only the 
earlier one) or because the compiler makes some incorrect assumptions 
about PUSH instructions and hence thinks the value of %edx is unreliable.

Gareth aka. Kit

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus