[fpc-devel] inline... and philosophy
margers.roked at inbox.lv
Fri Nov 22 15:22:19 CET 2019
> Does that mean in some situations, if you have a small, tight loop, it
> might be better to optimise over speed in some very rare cases? For
> example, turning MOV EAX, $FFFFFFFF into OR EAX, $FF to squeeze out a
> few extra bytes, even though the instruction introduces a false dependency.
Latency 4 clock cycles is a lot. As long dependency can be resolved in shorter time there will be some performance gain.
That performance penalty is not fixed 20%. It depends what code you have before that. Long latency instructions have time to catch up with rest of code. It is possible to completely cancel out, by placing call so that ret will fall into next 64 byte line.
It's place where tricky optimizations can be done.
More information about the fpc-devel