[fpc-devel] Experimentation: "Branch stitching"
J. Gareth Moreton
gareth at moreton-family.com
Mon Nov 28 14:32:01 CET 2022
On 28/11/2022 12:59, Martin Frb via fpc-devel wrote:
> On 28/11/2022 07:22, J. Gareth Moreton via fpc-devel wrote:
>> ...
>> testb %al,%al
>> je .Lj733
>> subb $1,%al
>> je .Lj734
>> jmp .Lj732
>> .balign 16,0x90
>> .Lj733:
>> ...
>> jmp .Lj718
>> .balign 16,0x90
>> .Lj732:
>> movl $2019050530,%ecx
>> call VERBOSE_$$_INTERNALERROR$LONGINT
>> jmp .Lj718
>>
>> The block with the internal error can be moved and 'stitched' to the
>> "jmp .Lj732" instruction.
>>
>> ...
>> testb %al,%al
>> je .Lj733
>> subb $1,%al
>> je .Lj734
>> movl $2019050530,%ecx
>> call VERBOSE_$$_INTERNALERROR$LONGINT
>> jmp .Lj718
>> .balign 16,0x90
>> .Lj733:
>> ...
>>
>> I'm still working a few things out, since it can move the function
>> epilogue which makes things harder to read. Currently I'm only
>> moving blocks where the label only has a single reference, thereby
>> causing a dead label when it's stitched alongside its corresponding
>> jump. This avoids problems where the label is referenced in a data
>> block that's distinct from the assembly and where moving it may cause
>> problems.
>
> Well first of all, you didn't move the balign in front of .Lj732
I do move the alignment hints, but if the label becomes dead (due to the
zero-distance jump being 'collapsed'), the alignment hint gets removed.
It's an experiment in progress.
> In the above example, that may be an improvement (most likely) because
> if the label really is referred once only (and thereby is also not a
> loop) then it may not be beneficial to align it (except maybe if the
> user specified a non default align?).
> If the label is referred only once, but the whole think is inside a
> loop .... it may still be relevant to have the align? (not sure,
> depends on how the cpu caches stuff)?
>
> Another thing is, that moving the block can make the other part of the
> loop longer (needing more cache). If this branch-to-be-moved is rarely
> entered, it may want to be after the final "jmp-to-loop-start" of the
> normal branch?
> Of course, if the loop is bigger than the block with the branches, and
> we did know that the branch is some sort of exception only, then we
> would want to move it even further away, to get it out of the loop......
It's a good point. I'll have to work out which situations will be fine
and which will increase the cache. How is a procedure loaded into the
CPU cache? Is there some good doumentation on this because I always
wondered if the whole thing, or at least as much as possible, was loaded
sequentially, and the alignment hints are mostly to avoid partial reads.
>
> --------------
> Btw, .balign N, 0x90 => isn't there an align that uses multibyte nop
> (like) instructions? (I posted some pdf to you a while back, iirc it
> points that out)
There is - it's the .plalign directive. I'm not sure why the compiler
mixes and matches them though.
Kit
More information about the fpc-devel
mailing list