[fpc-devel] Peephole optimizer tai class change proposals

J. Gareth Moreton gareth at moreton-family.com
Tue Oct 19 21:10:39 CEST 2021


I do hope we can find an acceptable proposal.  I'm finding other 
potential uses for such extra information that can't really be 
replicated in the optimiser any other way, or it will take prohibitively 
long to do so (e.g. on the order of O(n²)).  For example, take this 
assembly snippet:

     addl    $1,%edi
     movq    -264(%rbp),%rax
     movl    %edi,%edx
     movq    56(%rax,%rdx,8),%rax
     (%rdx deallocated)

At the moment, the optimiser cannot make this any better, but as humans, 
we can note that the upper 32 bits of %rdi are zero because of the ADD 
instruction, and so improved code would be:

     addl    $1,%edi
     movq    -264(%rbp),%rax
     movq    56(%rax,%rdi,8),%rax

(Note that even if %edi wasn't used later, we can't merge the ADD 
instruction into the reference because of what happens if %edi = 
$FFFFFFFF before the ADD instruction)

Currently, the compiler cannot do this optimisation automatically 
because while it evaluates "movl %edi,%edx", the peephole optimizer 
doesn't know about the state of %rdi - for all it knows, the upper 32 
bits could be non-zero.  A solution could be for the ADD instruction to 
search ahead for the next MOV instruction that uses its output register 
and store extra information there to indicate that the upper 32 bits are 
definitely zero.  There might be other solutions to this particular 
optimisation, like extending the capabilities of register-tracking 
objects, although i don't know how practical this really is.

A deeper example might be to help track dependency chains to help the 
compiler make more informed long-distance optimisations. Currently, I'm 
improving some of the optimisations made by OptPass2ADD and OptPass2SUB 
under -O3 in the name of breaking dependency chains, which can span a 
fair few instructions. However, there are a few cases where it grows the 
code size without improving speed.  For example:

     addl    $1,%edi
     movq    16(%rbx),%rax
     movq    8(%rax),%rcx
     movl    %edi,%edx

Under my improvements, which are beneficial in most places, this will 
change to the following:

     leal    1(%edi),%edx
     addl    $1,%edi
     movq    16(%rbx),%rax
     movq    8(%rax),%rcx

Because of the dependency chain between "movq 16(%rbx),%rax" and "movq 
8(%rax),%rcx" along with using another AGU, this sequence will still 
take 2 cycles to execute at best (not counting latency from memory 
accesses).  Better insight into the dependency chain and the execution 
ports used might tell the compiler not to make the optimisation in this 
case.

Heh, this is starting to delve beyond simple logic and into the realm of 
machine learning if I'm not careful!

Gareth aka. Kit

On 17/10/2021 15:24, J. Gareth Moreton via fpc-devel wrote:
> That's why I was discussing with Jonas in how to handle that, since 
> currently tai objects don't have a clean way to free them themselves, 
> and optinfo is an untyped Pointer.  However, Jonas suggested to have 
> the extra info objects stored in a linked list, so the solution I have 
> in my showcase is a linked list owned by the TAsmOptimizer object that 
> frees everything when it's destroyed.  If an instruction is added or 
> destroyed, the extra info objects associated with remain in the linked 
> list, just not attached to anything.  True, there would be dangling 
> pointers left in them, but they're solely for searching purposes and 
> they generally aren't dereferenced.  You need a valid tai object to 
> access the relevant extra info.
>
> Granted, it would be cleaner to simply have the extra opt accessed 
> through a new field in the tai declaration, and the constructor and 
> destructor handle initialisation and cleanup.
>
> Gareth aka. Kit
>
> On 17/10/2021 15:00, Florian Klämpfl via fpc-devel wrote:
>>
>>> Am 11.10.2021 um 10:00 schrieb J. Gareth Moreton via fpc-devel 
>>> <fpc-devel at lists.freepascal.org>:
>>>
>>> One for Jonas mainly, but also for Florian.  This is a new "extra 
>>> optimisation information" feature that allows the peephole optimizer 
>>> to leave 'notes' and other extra information on individual tai 
>>> objects for later reference.  An initial showcase is to store a link 
>>> to the destination label if it's not available in the lookup table 
>>> (becuase it was created later by a peephole optimisation).
>>>
>>> https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/74/diffs
>>>
>>> Currently the showcase doesn't appear to show any additional 
>>> optimisations in the x86-64 RTL because the jump optimisation that 
>>> creates a new label is almost never called.
>>>
>>> I'll use this feature more extensively in the future, such as for 
>>> storing information on the values of registers or making a note of a 
>>> label that should be removed if possible because it would cause a 
>>> long-term optimisation (something that a peephole optimisation that 
>>> removes the label may not be able to determine because its own 
>>> optimisation is questionable without that information).  See 
>>> previous e-mails in this chain for an example.
>> I fear a little bit that this extra info is messed up when 
>> instructions are added/removed.
>>
>> _______________________________________________
>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>
>

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



More information about the fpc-devel mailing list