[fpc-devel] AMD64 - more efficient code padding

Florian Klämpfl florian at freepascal.org
Thu Nov 16 22:02:47 CET 2017


Am 11.11.2017 um 16:53 schrieb J. Gareth Moreton:
> Hi everyone,
> 
> So I've noticed that when certain blocks of code are aligned (usually to a 16-byte boundary), the bytes in 
> between are set to a combination of 90H, 66 90H, 66 66 90H or 66 66 66 90H.  This is fine and all, but for 
> any sequences larger than 4 bytes, it requires up to 4 instructions, which might start incurring a small 
> performance penalty in the instruction queue (although given the size of the queue is generally over 50 
> instructions, this is negligible at best).
> 
> However, reading the Intel instruction reference, they recommend the following sequences:
> 
> 1 byte - 90H
> 2 bytes - 66 90H
> 3 bytes - 0F 1F 00H
> 4 bytes - 0F 1F 40 00H (AMD still recommends 66 66 66 90H)
> 5 bytes - 0F 1F 44 00 00H
> 6 bytes - 66 0F 1F 44 00 00H
> 7 bytes - 0F 1F 80 00 00 00 00H
> 8 bytes - 0F 1F 84 00 00 00 00 00H
> 9 bytes - 66 0F 1F 84 00 00 00 00 00H
> 
> Now, they do warn that 0F 1FH will trigger a SIGILL if the processor doesn't support it (unlike 90H, which 
> is an alias of "xchg %ax, %ax"), however it has been supported since the Pentium Pro, and is all but 
> guaranteed to be supported on AMD64 because of the requirements of features like SSE2 that arrived in the 
> Pentium III era.  Is it worth updating the longer byte sequences to use the 5-to-9-byte sequences for a 
> very minor performance boost and reduction in file entropy (the 00s will be easier to compress since they 
> generally appear more frequently in the entirety of the binary)?

What version did you use? If the FPC internal assembler is used, trunk should use the sequence
mentioned above always on x86-64, on i386 it uses it if -Cppentium2 or higher is used. I implemented
this in revisions 29772 and 29777.




More information about the fpc-devel mailing list