[fpc-devel] AMD64 - more efficient code padding

J. Gareth Moreton gareth at moreton-family.com
Sat Nov 11 16:53:18 CET 2017

Hi everyone,

So I've noticed that when certain blocks of code are aligned (usually to a 16-byte boundary), the bytes in 
between are set to a combination of 90H, 66 90H, 66 66 90H or 66 66 66 90H.  This is fine and all, but for 
any sequences larger than 4 bytes, it requires up to 4 instructions, which might start incurring a small 
performance penalty in the instruction queue (although given the size of the queue is generally over 50 
instructions, this is negligible at best).

However, reading the Intel instruction reference, they recommend the following sequences:

1 byte - 90H
2 bytes - 66 90H
3 bytes - 0F 1F 00H
4 bytes - 0F 1F 40 00H (AMD still recommends 66 66 66 90H)
5 bytes - 0F 1F 44 00 00H
6 bytes - 66 0F 1F 44 00 00H
7 bytes - 0F 1F 80 00 00 00 00H
8 bytes - 0F 1F 84 00 00 00 00 00H
9 bytes - 66 0F 1F 84 00 00 00 00 00H

Now, they do warn that 0F 1FH will trigger a SIGILL if the processor doesn't support it (unlike 90H, which 
is an alias of "xchg %ax, %ax"), however it has been supported since the Pentium Pro, and is all but 
guaranteed to be supported on AMD64 because of the requirements of features like SSE2 that arrived in the 
Pentium III era.  Is it worth updating the longer byte sequences to use the 5-to-9-byte sequences for a 
very minor performance boost and reduction in file entropy (the 00s will be easier to compress since they 
generally appear more frequently in the entirety of the binary)?

Yours faithfully,

J. Gareth Moreton

More information about the fpc-devel mailing list