[fpc-devel] Prototype optimisation... Sliding Window
J. Gareth Moreton
gareth at moreton-family.com
Fri Feb 25 15:19:17 CET 2022
On 25/02/2022 08:29, Marco Borsari via fpc-devel wrote:
> This is very useful, thank you.
> I think FPC has an excellent register allocator, but frustrated on 32 bit
> by scarce resources and by the lack of reloading check.
Unfortunately the equivalent procedure isn't optimised on i386-win32:
.Lj679:
movl %eax,%edx
.Lj680:
movl %edx,-832(%ebp)
leal (,%edx,8),%ecx
movl -824(%ebp),%edx
movl 76(%edx),%eax
cltd
idivl %ecx
imull -832(%ebp),%eax
movl %eax,-828(%ebp)
addl 8(%ebp),%eax
movl %eax,-828(%ebp)
movl -832(%ebp),%eax
leal (,%eax,8),%ecx
movl -824(%ebp),%edx
movl 76(%edx),%eax
cltd
idivl %ecx
movl %edx,%esi
The compiler has no way of knowing that -832(%ebp) contains the value of
%edx at the start and hence loaded into %eax (which is used for the
initial address instead of %edx, although the optimisation would still
fail even if they used the same registers) in the repeated sequence. A
lot of these optimisations may require a means of adding 'hints' to the
assembly language list to indicate the state of things.
A more minor example in the same unit (dbgdwarf):
movl %eax,%esi
movl 60(%eax),%edx
movl -564(%ebp),%eax
cmpl 72(%eax),%edx
jl .Lj359
movl 60(%esi),%edx
movl -564(%ebp),%eax
cmpl 76(%eax),%edx
This only gets optimised to:
movl %eax,%esi
movl 60(%eax),%edx
movl -564(%ebp),%eax
cmpl 72(%eax),%edx
jl .Lj359
movl 60(%esi),%edx
cmpl 76(%eax),%edx
This is because the peephole optimiser changes %esi to %eax in the "movl
60(%eax),%edx" instruction on account that it will minimise a pipeline
stall (it doesn't have to wait for %esi to get loaded when %eax is
definitely loaded). If there was a means of leaving a hint that %esi =
%eax at that point, then it might be possible to better optimise it to
the ideal:
movl %eax,%esi
movl 60(%eax),%edx
movl -564(%ebp),%eax
cmpl 72(%eax),%edx
jl .Lj359
cmpl 76(%eax),%edx
This is what my proposed feature over at
https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/74 is
meant to help with (the showcase uses the "extra optimisation
information" to store information on the state of the upper 32 bits of
registers in x86_64, so it can make deeper optimisations knowing whether
it's set to zero or not).
Some other things might need some deeper thought:
movl -16(%ebp),%edx
movl (%edx),%eax
movl 20(%eax),%eax
movl 20(%eax),%eax
movzbl 169(%eax),%eax
pushl %eax
movl -16(%ebp),%edx
movl (%edx),%eax
For some reason, the second "movl -16(%ebp),%edx" isn't removed. I'm not
sure yet whether this is because the sliding window is too small (the
first one gets removed due to another "movl -16(%ebp),%edx" that appears
earlier, so this entry does NOT appear in the sliding window, only the
earlier one) or because the compiler makes some incorrect assumptions
about PUSH instructions and hence thinks the value of %edx is unreliable.
Gareth aka. Kit
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list