[fpc-devel] Experimentation: "Branch stitching"

J. Gareth Moreton gareth at moreton-family.com
Mon Nov 28 16:19:30 CET 2022

I admit I can be disorganised sometimes and lose documents, so I 
apologise if you have sent them already and I mislaid them in my mess of 
a directory tree.  Believe me though, I want to swallow all of this up 
if it means squeezing out every cycle I can out of the generated machine 

Curious to know... at which point did it become favourable to do a 
32-byte align rather than a 16-byte align on x86 processors? Should the 
compiler start favouring 32-byte aligns for loops, say?


On 28/11/2022 13:52, Martin Frb via fpc-devel wrote:
> On 28/11/2022 14:32, J. Gareth Moreton via fpc-devel wrote:
>> On 28/11/2022 12:59, Martin Frb via fpc-devel wrote:
>>> Well first of all, you didn't move the balign in front of .Lj732
>> I do move the alignment hints, but if the label becomes dead (due to 
>> the zero-distance jump being 'collapsed'), the alignment hint gets 
>> removed.  It's an experiment in progress.
> Ah, yes right.
> Anyway this may be more of a 32 byte thing, and the 16 byte align is 
> at best a 50/50 game
> I once had a better source on the topic (also it might be in the pdf I 
> once sent) but for now:
> https://superuser.com/questions/1368480/how-is-the-micro-op-cache-tagged
>> Each 32B window (from the instruction cache) is mapped into the uop 
>> cache
> (in case of an outer loop) Due to the size of that cache depending 
> what else is executed, uops may or may not be cached (also only 
> matters if the moved block is (inside a loop) frequently entered).
> But ultimately, the 16 bytes align are not meant for that. Though if a 
> user used a directive to set a 32byte align => then that may matter.
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

More information about the fpc-devel mailing list