[fpc-devel] Experimentation: "Branch stitching"

J. Gareth Moreton gareth at moreton-family.com
Mon Nov 28 14:32:01 CET 2022


On 28/11/2022 12:59, Martin Frb via fpc-devel wrote:
> On 28/11/2022 07:22, J. Gareth Moreton via fpc-devel wrote:
>> ...
>>     testb   %al,%al
>>     je     .Lj733
>>     subb    $1,%al
>>     je     .Lj734
>>     jmp    .Lj732
>>     .balign 16,0x90
>> .Lj733:
>>     ...
>>     jmp    .Lj718
>>     .balign 16,0x90
>> .Lj732:
>>     movl    $2019050530,%ecx
>>     call    VERBOSE_$$_INTERNALERROR$LONGINT
>>     jmp    .Lj718
>>
>> The block with the internal error can be moved and 'stitched' to the 
>> "jmp .Lj732" instruction.
>>
>>     ...
>>     testb    %al,%al
>>     je    .Lj733
>>     subb    $1,%al
>>     je    .Lj734
>>     movl    $2019050530,%ecx
>>     call    VERBOSE_$$_INTERNALERROR$LONGINT
>>     jmp    .Lj718
>>     .balign 16,0x90
>> .Lj733:
>>     ...
>>
>> I'm still working a few things out, since it can move the function 
>> epilogue which makes things harder to read.  Currently I'm only 
>> moving blocks where the label only has a single reference, thereby 
>> causing a dead label when it's stitched alongside its corresponding 
>> jump.  This avoids problems where the label is referenced in a data 
>> block that's distinct from the assembly and where moving it may cause 
>> problems.
>
> Well first of all, you didn't move the balign in front of .Lj732

I do move the alignment hints, but if the label becomes dead (due to the 
zero-distance jump being 'collapsed'), the alignment hint gets removed.  
It's an experiment in progress.

> In the above example, that may be an improvement (most likely) because 
> if the label really is referred once only (and thereby is also not a 
> loop) then it may not be beneficial to align it (except maybe if the 
> user specified a non default align?).
> If the label is referred only once, but the whole think is inside a 
> loop .... it may still be relevant to have the align? (not sure, 
> depends on how the cpu caches stuff)?
>
> Another thing is, that moving the block can make the other part of the 
> loop longer (needing more cache). If this branch-to-be-moved is rarely 
> entered, it may want to be after the final "jmp-to-loop-start" of the 
> normal branch?
> Of course, if the loop is bigger than the block with the branches, and 
> we did know that the branch is some sort of exception only, then we 
> would want to move it even further away, to get it out of the loop......
It's a good point.  I'll have to work out which situations will be fine 
and which will increase the cache.  How is a procedure loaded into the 
CPU cache?  Is there some good doumentation on this because I always 
wondered if the whole thing, or at least as much as possible, was loaded 
sequentially, and the alignment hints are mostly to avoid partial reads.
>
> --------------
> Btw, .balign N, 0x90 => isn't there an align that uses multibyte nop 
> (like) instructions? (I posted some pdf to you a while back, iirc it 
> points that out)

There is - it's the .plalign directive.  I'm not sure why the compiler 
mixes and matches them though.

Kit


More information about the fpc-devel mailing list