[fpc-devel] x86_64 question

J. Gareth Moreton gareth at moreton-family.com
Thu Oct 1 22:36:58 CEST 2020

I thought that might be the case - thanks Nikolay.  And I meant to say 
lower bits of a REGISTER, not an instruction!

Admittedly I'm cycle-counting and byte-counting again!  I was looking 
for ways to reduce 13 bytes of padding in one of my pure assembly 
language routines and realised I could make a saving there.  The only 
thing I can think of that I have to watch out for logically is if I 
change, say, TEST EAX, $80 to TEST AL, $80, the latter will set the sign 
flag if the most-significant bit is 1 after the 'and' operation) while 
the former always clears the sign flag.

I have used such subregisters before in the FPC RTL, in fpc_int_real and 
fpc_frac_real in rtl/x86_64/math.inc, where I read AX instead of the 
larger RAX, but that's only after a call to "SHR RAX, 48" that 
guarantees that everything above the 16th bit is zero, and after testing 
other implementation candidates a kind of informal competition. 
(Surprisingly, I think "shr $48, %rax; and $0x7ff0,%ax; cmp $0x4330,%ax" 
runs faster than moving 64-bit constants into temporary registers (since 
64-bit immediates aren't supported outside of MOV) and using 'and' and 
'cmp' on %rax directly)

I think you always get a read penalty when using the high-byte registers 
because the processor has to do an implicit shift operation.

Thanks again for the answer.

Gareth aka. Kit

On 01/10/2020 19:43, Nikolay Nikolov via fpc-devel wrote:
> On 10/1/20 8:17 PM, J. Gareth Moreton via fpc-devel wrote:
>> Hi everyone,
>> I have a small question with assembler size optimisation that maybe 
>> one of you guys can give me a second opinion on:
>> If you are using the "test" instruction to test some of the lower 
>> bits of an instruction, e.g. TEST RCX, $2, is there a penalty with 
>> calling TEST CL, $2 instead? The instruction size is a lot smaller on 
>> account of the immediate only being 1 byte long instead of 4 bytes, 
>> and are mathematically equivalent.  I know you have to be careful 
>> with partial write penalties, but partial reads seem to be a bit more 
>> nebulous (the register is not modified with TEST).
> Yes, I think the shorter TEST CL, $2 is preferred over TEST RCX, $2 on 
> every x86_64 CPU. AFAIK, there's no penalty for using 8-bit 
> subregisters (except perhaps AH, BH, CH and DH, but the FPC code 
> generator doesn't use them). Others can correct me if I'm wrong.
> Nikolay
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel

This email has been checked for viruses by Avast antivirus software.

More information about the fpc-devel mailing list