[fpc-devel] Register renaming and false dependency question
Stefan Glienke
sglienke at dsharp.org
Mon Oct 18 00:53:41 CEST 2021
According to compiler explorer clang, gcc and msvc compile this to the
same code with -O3 as FPC does. So I would assume that is fine.
Am 17.10.2021 um 13:25 schrieb J. Gareth Moreton via fpc-devel:
> Hi everyone,
>
> While reading up on some algorithms, I came across a recommendation of
> using a shorter arithmetic function to change the value of a constant
> in a register rather than loading the new value directly. However,
> the algorithm assumes a RISC-like processor, so I'm not sure if it
> applies to an Intel x86-64 processor. Consider the following:
>
> movq $0xaaaaaaaaaaaaaaab,%rax
> imulq %rax,%rcx
> movq $0x5555555555555555,%rax
> cmpq %rax,%rcx
> setle %al
>
> This algorithm sets %al to 1 if %rcx is divisible by 3, and 0 if it's
> not, and was compiled from the following Pascal code (under -O3, but
> -O1 produces almost exactly the same):
>
> function IsDivisible3(Numerator: QWord): Boolean;
> begin
> Result := (Numerator * $AAAAAAAAAAAAAAAB) <= $5555555555555555;
> end;
>
> (One of my merge requests produces this code from "Result := (x mod 3)
> = 0")
>
> My question is this: can "movq $0x5555555555555555,%rax" be replaced
> with "shrq $0x1,%rax" without incurring an additional pipeline stall?
> The MOV instruction takes 10 bytes to store, while "SHR 1" takes only
> 3. Given that %rax is used beforehand and the CMP instruction has to
> wait until the IMUL instruction has finished executing, logic tells me
> that I can get away with it here, but I'm not sure if the metric to go
> by is the execution speed of IMUL (i.e. the IMUL instruction is the
> limiting factor before CMP can be executed), or the simple fact that
> the previous value of %rax was used and will be loaded with
> $AAAAAAAAAAAAAAAB by the time it comes to load it with a new value.
>
> Gareth aka. Kit
>
>
More information about the fpc-devel
mailing list