[fpc-devel] Policy regarding SHL/SHR under x86

J. Gareth Moreton gareth at moreton-family.com
Mon Oct 24 12:51:32 CEST 2022

Hi everyone,

I'm looking at more optimisations under x86_64 that use the BMI 
instructions.  I've come across one situation that I need clarity on... 
how are SHL and SHR instructions handled if the shift value exceeds the 
word size?

For example, say I have the code "power2 := 1 shl index;"... assuming 
everything here is a LongWord (32-bit unsigned integer), what happens if 
index is 32 or higher?  Logic would state that the result should be 
zero, since all of the bits are essentially shifted out of the register, 
but under x86, the SHL and SHR instructions effectively do the following:

power2 := 1 shi (index mod 32);

("mod 64" for 64-bit registers)

So in this case, 1 shl 32 will return 1.  This may cause problems with 
other platforms where the index isn't masked like this, and I've noticed 
code in the compiler that looks out for this, which is especially 
important for bitmasks since "(1 shl 32) - 1" won't return $FFFFFFFF as 
expected, but 0 instead.  There are some problems though because the 
BZHI instruction, which is otherwise great for producing such masks, 
doesn't modify the input if the index is equal to or greater than the 
word size (it does set the carry flag though) and so the equivalent of 
"(1 shl 32) - 1" will return $FFFFFFFF.

 From a cross-platform perspective, how are too-large indices generally 
handled to ensure consistent cross-platform behaviour? I know ARM and 
AArch64 also reduces the index modulo the word size, but I don't know if 
this holds for other platforms.  If one assumes (index mod word_size), 
it will make any kind of BZHI optimisation a little janky (any kind of 
optimisation that doesn't account for the case of a too-large index 
would only be valid under -O4 rules).


More information about the fpc-devel mailing list