[fpc-devel] Register renaming and false dependency question

J. Gareth Moreton gareth at moreton-family.com
Sun Oct 17 13:25:23 CEST 2021

Hi everyone,

While reading up on some algorithms, I came across a recommendation of 
using a shorter arithmetic function to change the value of a constant in 
a register rather than loading the new value directly.  However, the 
algorithm assumes a RISC-like processor, so I'm not sure if it applies 
to an Intel x86-64 processor.  Consider the following:

movq $0xaaaaaaaaaaaaaaab,%rax
imulq   %rax,%rcx
movq $0x5555555555555555,%rax
cmpq    %rax,%rcx
setle  %al

This algorithm sets %al to 1 if %rcx is divisible by 3, and 0 if it's 
not, and was compiled from the following Pascal code (under -O3, but -O1 
produces almost exactly the same):

function IsDivisible3(Numerator: QWord): Boolean;
   Result := (Numerator * $AAAAAAAAAAAAAAAB) <= $5555555555555555;

(One of my merge requests produces this code from "Result := (x mod 3) = 0")

My question is this: can "movq $0x5555555555555555,%rax" be replaced 
with "shrq $0x1,%rax" without incurring an additional pipeline stall?  
The MOV instruction takes 10 bytes to store, while "SHR 1" takes only 
3.  Given that %rax is used beforehand and the CMP instruction has to 
wait until the IMUL instruction has finished executing, logic tells me 
that I can get away with it here, but I'm not sure if the metric to go 
by is the execution speed of IMUL (i.e. the IMUL instruction is the 
limiting factor before CMP can be executed), or the simple fact that the 
previous value of %rax was used and will be loaded with 
$AAAAAAAAAAAAAAAB by the time it comes to load it with a new value.

Gareth aka. Kit

This email has been checked for viruses by Avast antivirus software.

More information about the fpc-devel mailing list