[fpc-devel] Experimentation: "Branch stitching"
J. Gareth Moreton
gareth at moreton-family.com
Mon Nov 28 16:19:30 CET 2022
I admit I can be disorganised sometimes and lose documents, so I
apologise if you have sent them already and I mislaid them in my mess of
a directory tree. Believe me though, I want to swallow all of this up
if it means squeezing out every cycle I can out of the generated machine
code!
Curious to know... at which point did it become favourable to do a
32-byte align rather than a 16-byte align on x86 processors? Should the
compiler start favouring 32-byte aligns for loops, say?
Kit
On 28/11/2022 13:52, Martin Frb via fpc-devel wrote:
> On 28/11/2022 14:32, J. Gareth Moreton via fpc-devel wrote:
>> On 28/11/2022 12:59, Martin Frb via fpc-devel wrote:
>>> Well first of all, you didn't move the balign in front of .Lj732
>>
>> I do move the alignment hints, but if the label becomes dead (due to
>> the zero-distance jump being 'collapsed'), the alignment hint gets
>> removed. It's an experiment in progress.
>
> Ah, yes right.
> Anyway this may be more of a 32 byte thing, and the 16 byte align is
> at best a 50/50 game
>
> I once had a better source on the topic (also it might be in the pdf I
> once sent) but for now:
> https://superuser.com/questions/1368480/how-is-the-micro-op-cache-tagged
>
>> Each 32B window (from the instruction cache) is mapped into the uop
>> cache
> (in case of an outer loop) Due to the size of that cache depending
> what else is executed, uops may or may not be cached (also only
> matters if the moved block is (inside a loop) frequently entered).
> But ultimately, the 16 bytes align are not meant for that. Though if a
> user used a directive to set a 32byte align => then that may matter.
>
>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
More information about the fpc-devel
mailing list