[fpc-devel] Experimentation: "Branch stitching"

Martin Frb lazarus at mfriebe.de
Mon Nov 28 13:59:47 CET 2022


On 28/11/2022 07:22, J. Gareth Moreton via fpc-devel wrote:
> ...
>     testb   %al,%al
>     je     .Lj733
>     subb    $1,%al
>     je     .Lj734
>     jmp    .Lj732
>     .balign 16,0x90
> .Lj733:
>     ...
>     jmp    .Lj718
>     .balign 16,0x90
> .Lj732:
>     movl    $2019050530,%ecx
>     call    VERBOSE_$$_INTERNALERROR$LONGINT
>     jmp    .Lj718
>
> The block with the internal error can be moved and 'stitched' to the 
> "jmp .Lj732" instruction.
>
>     ...
>     testb    %al,%al
>     je    .Lj733
>     subb    $1,%al
>     je    .Lj734
>     movl    $2019050530,%ecx
>     call    VERBOSE_$$_INTERNALERROR$LONGINT
>     jmp    .Lj718
>     .balign 16,0x90
> .Lj733:
>     ...
>
> I'm still working a few things out, since it can move the function 
> epilogue which makes things harder to read.  Currently I'm only moving 
> blocks where the label only has a single reference, thereby causing a 
> dead label when it's stitched alongside its corresponding jump.  This 
> avoids problems where the label is referenced in a data block that's 
> distinct from the assembly and where moving it may cause problems.

Well first of all, you didn't move the balign in front of .Lj732

In the above example, that may be an improvement (most likely) because 
if the label really is referred once only (and thereby is also not a 
loop) then it may not be beneficial to align it (except maybe if the 
user specified a non default align?).
If the label is referred only once, but the whole think is inside a loop 
.... it may still be relevant to have the align? (not sure, depends on 
how the cpu caches stuff)?

Another thing is, that moving the block can make the other part of the 
loop longer (needing more cache). If this branch-to-be-moved is rarely 
entered, it may want to be after the final "jmp-to-loop-start" of the 
normal branch?
Of course, if the loop is bigger than the block with the branches, and 
we did know that the branch is some sort of exception only, then we 
would want to move it even further away, to get it out of the loop......

--------------
Btw, .balign N, 0x90 => isn't there an align that uses multibyte nop 
(like) instructions? (I posted some pdf to you a while back, iirc it 
points that out)


More information about the fpc-devel mailing list