[fpc-devel] Register renaming and false dependency question

Mon Oct 18 00:53:41 CEST 2021

According to compiler explorer clang, gcc and msvc compile this to the 
same code with -O3 as FPC does. So I would assume that is fine.

Am 17.10.2021 um 13:25 schrieb J. Gareth Moreton via fpc-devel:
> Hi everyone,
>
> While reading up on some algorithms, I came across a recommendation of 
> using a shorter arithmetic function to change the value of a constant 
> in a register rather than loading the new value directly.  However, 
> the algorithm assumes a RISC-like processor, so I'm not sure if it 
> applies to an Intel x86-64 processor.  Consider the following:
>
> movq $0xaaaaaaaaaaaaaaab,%rax
> imulq   %rax,%rcx
> movq $0x5555555555555555,%rax
> cmpq    %rax,%rcx
> setle  %al
>
> This algorithm sets %al to 1 if %rcx is divisible by 3, and 0 if it's 
> not, and was compiled from the following Pascal code (under -O3, but 
> -O1 produces almost exactly the same):
>
> function IsDivisible3(Numerator: QWord): Boolean;
> begin
>   Result := (Numerator * $AAAAAAAAAAAAAAAB) <= $5555555555555555;
> end;
>
> (One of my merge requests produces this code from "Result := (x mod 3) 
> = 0")
>
> My question is this: can "movq $0x5555555555555555,%rax" be replaced 
> with "shrq $0x1,%rax" without incurring an additional pipeline stall?  
> The MOV instruction takes 10 bytes to store, while "SHR 1" takes only 
> 3.  Given that %rax is used beforehand and the CMP instruction has to 
> wait until the IMUL instruction has finished executing, logic tells me 
> that I can get away with it here, but I'm not sure if the metric to go 
> by is the execution speed of IMUL (i.e. the IMUL instruction is the 
> limiting factor before CMP can be executed), or the simple fact that 
> the previous value of %rax was used and will be loaded with 
> $AAAAAAAAAAAAAAAB by the time it comes to load it with a new value.
>
> Gareth aka. Kit
>
>