[fpc-devel] Peephole optimizer tai class change proposals
J. Gareth Moreton
gareth at moreton-family.com
Tue Oct 19 21:10:39 CEST 2021
I do hope we can find an acceptable proposal. I'm finding other
potential uses for such extra information that can't really be
replicated in the optimiser any other way, or it will take prohibitively
long to do so (e.g. on the order of O(n²)). For example, take this
assembly snippet:
addl $1,%edi
movq -264(%rbp),%rax
movl %edi,%edx
movq 56(%rax,%rdx,8),%rax
(%rdx deallocated)
At the moment, the optimiser cannot make this any better, but as humans,
we can note that the upper 32 bits of %rdi are zero because of the ADD
instruction, and so improved code would be:
addl $1,%edi
movq -264(%rbp),%rax
movq 56(%rax,%rdi,8),%rax
(Note that even if %edi wasn't used later, we can't merge the ADD
instruction into the reference because of what happens if %edi =
$FFFFFFFF before the ADD instruction)
Currently, the compiler cannot do this optimisation automatically
because while it evaluates "movl %edi,%edx", the peephole optimizer
doesn't know about the state of %rdi - for all it knows, the upper 32
bits could be non-zero. A solution could be for the ADD instruction to
search ahead for the next MOV instruction that uses its output register
and store extra information there to indicate that the upper 32 bits are
definitely zero. There might be other solutions to this particular
optimisation, like extending the capabilities of register-tracking
objects, although i don't know how practical this really is.
A deeper example might be to help track dependency chains to help the
compiler make more informed long-distance optimisations. Currently, I'm
improving some of the optimisations made by OptPass2ADD and OptPass2SUB
under -O3 in the name of breaking dependency chains, which can span a
fair few instructions. However, there are a few cases where it grows the
code size without improving speed. For example:
addl $1,%edi
movq 16(%rbx),%rax
movq 8(%rax),%rcx
movl %edi,%edx
Under my improvements, which are beneficial in most places, this will
change to the following:
leal 1(%edi),%edx
addl $1,%edi
movq 16(%rbx),%rax
movq 8(%rax),%rcx
Because of the dependency chain between "movq 16(%rbx),%rax" and "movq
8(%rax),%rcx" along with using another AGU, this sequence will still
take 2 cycles to execute at best (not counting latency from memory
accesses). Better insight into the dependency chain and the execution
ports used might tell the compiler not to make the optimisation in this
case.
Heh, this is starting to delve beyond simple logic and into the realm of
machine learning if I'm not careful!
Gareth aka. Kit
On 17/10/2021 15:24, J. Gareth Moreton via fpc-devel wrote:
> That's why I was discussing with Jonas in how to handle that, since
> currently tai objects don't have a clean way to free them themselves,
> and optinfo is an untyped Pointer. However, Jonas suggested to have
> the extra info objects stored in a linked list, so the solution I have
> in my showcase is a linked list owned by the TAsmOptimizer object that
> frees everything when it's destroyed. If an instruction is added or
> destroyed, the extra info objects associated with remain in the linked
> list, just not attached to anything. True, there would be dangling
> pointers left in them, but they're solely for searching purposes and
> they generally aren't dereferenced. You need a valid tai object to
> access the relevant extra info.
>
> Granted, it would be cleaner to simply have the extra opt accessed
> through a new field in the tai declaration, and the constructor and
> destructor handle initialisation and cleanup.
>
> Gareth aka. Kit
>
> On 17/10/2021 15:00, Florian Klämpfl via fpc-devel wrote:
>>
>>> Am 11.10.2021 um 10:00 schrieb J. Gareth Moreton via fpc-devel
>>> <fpc-devel at lists.freepascal.org>:
>>>
>>> One for Jonas mainly, but also for Florian. This is a new "extra
>>> optimisation information" feature that allows the peephole optimizer
>>> to leave 'notes' and other extra information on individual tai
>>> objects for later reference. An initial showcase is to store a link
>>> to the destination label if it's not available in the lookup table
>>> (becuase it was created later by a peephole optimisation).
>>>
>>> https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/74/diffs
>>>
>>> Currently the showcase doesn't appear to show any additional
>>> optimisations in the x86-64 RTL because the jump optimisation that
>>> creates a new label is almost never called.
>>>
>>> I'll use this feature more extensively in the future, such as for
>>> storing information on the values of registers or making a note of a
>>> label that should be removed if possible because it would cause a
>>> long-term optimisation (something that a peephole optimisation that
>>> removes the label may not be able to determine because its own
>>> optimisation is questionable without that information). See
>>> previous e-mails in this chain for an example.
>> I fear a little bit that this extra info is messed up when
>> instructions are added/removed.
>>
>> _______________________________________________
>> fpc-devel maillist - fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>
>
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list