[fpc-devel] Policy regarding SHL/SHR under x86
J. Gareth Moreton
gareth at moreton-family.com
Mon Oct 24 12:51:32 CEST 2022
Hi everyone,
I'm looking at more optimisations under x86_64 that use the BMI
instructions. I've come across one situation that I need clarity on...
how are SHL and SHR instructions handled if the shift value exceeds the
word size?
For example, say I have the code "power2 := 1 shl index;"... assuming
everything here is a LongWord (32-bit unsigned integer), what happens if
index is 32 or higher? Logic would state that the result should be
zero, since all of the bits are essentially shifted out of the register,
but under x86, the SHL and SHR instructions effectively do the following:
power2 := 1 shi (index mod 32);
("mod 64" for 64-bit registers)
So in this case, 1 shl 32 will return 1. This may cause problems with
other platforms where the index isn't masked like this, and I've noticed
code in the compiler that looks out for this, which is especially
important for bitmasks since "(1 shl 32) - 1" won't return $FFFFFFFF as
expected, but 0 instead. There are some problems though because the
BZHI instruction, which is otherwise great for producing such masks,
doesn't modify the input if the index is equal to or greater than the
word size (it does set the carry flag though) and so the equivalent of
"(1 shl 32) - 1" will return $FFFFFFFF.
From a cross-platform perspective, how are too-large indices generally
handled to ensure consistent cross-platform behaviour? I know ARM and
AArch64 also reduces the index modulo the word size, but I don't know if
this holds for other platforms. If one assumes (index mod word_size),
it will make any kind of BZHI optimisation a little janky (any kind of
optimisation that doesn't account for the case of a too-large index
would only be valid under -O4 rules).
Kit
More information about the fpc-devel
mailing list