[fpc-devel] Double-checking an optimisation

Sun Jan 9 02:47:32 CET 2022

On 09/01/2022 01:37, J. Gareth Moreton via fpc-devel wrote:
> Hi everyone,
>
> So a merge request of mine was just approved that allows the peephole 
> optimizer access to more registers when it needs one for temporary 
> storage.  It allows it to make an optimisation on x86_64-win64 that 
> wasn't possible before due to the lack of available volatile 
> registers.  In packages\numlib\src\dsl.pas - before:
>
> .Lj184:
>     ...
>     cmpl    $1,%ecx
>     jng    .Lj188
>     subl    $1,%ecx
> .Lj188:
>     ...
>
> After:
>
> .Lj184:
>     ...
>     cmpl    $1,%ecx
>     setg    %bl
>     movzbl    %bl,%ebx
>     subl    %ebx,%ecx
>     ...
>
> %ebx is a non-volatile register, but the current subroutine preserves 
> it and it's not currently in use, so the peephole optimizer can borrow 
> it for a few instructions.
>
> I need to double-check though... is this actually a good optimisation 
> for speed?  It removes a jump and a label, which might permit other 
> long-range optimisations, but it's 3 instructions that are in a 
> dependency chain.

I take it, it also is one (or two?) bytes longer? If that is in a loop, 
which otherwise is exactly within a 32 byte aligned block, then that 
could cause a slow down too. (If the loop is 16 bytes long, but aligned 
to a 32byte-bound+16, then it may slow down if the loop code size goes 
from 16 to 17 bytes, because that is when it goes over the boundary of 
the 32byte block.
This is a bit hard to predict. But within very small loops (even 2 or 
maybe 3 blocks of 32 bytes), size may be as important. (Actually a good 
question, what weighs more....)