[fpc-devel] AMD64 - more efficient code padding
J. Gareth Moreton
gareth at moreton-family.com
Sat Nov 11 16:53:18 CET 2017
Hi everyone,
So I've noticed that when certain blocks of code are aligned (usually to a 16-byte boundary), the bytes in
between are set to a combination of 90H, 66 90H, 66 66 90H or 66 66 66 90H. This is fine and all, but for
any sequences larger than 4 bytes, it requires up to 4 instructions, which might start incurring a small
performance penalty in the instruction queue (although given the size of the queue is generally over 50
instructions, this is negligible at best).
However, reading the Intel instruction reference, they recommend the following sequences:
1 byte - 90H
2 bytes - 66 90H
3 bytes - 0F 1F 00H
4 bytes - 0F 1F 40 00H (AMD still recommends 66 66 66 90H)
5 bytes - 0F 1F 44 00 00H
6 bytes - 66 0F 1F 44 00 00H
7 bytes - 0F 1F 80 00 00 00 00H
8 bytes - 0F 1F 84 00 00 00 00 00H
9 bytes - 66 0F 1F 84 00 00 00 00 00H
Now, they do warn that 0F 1FH will trigger a SIGILL if the processor doesn't support it (unlike 90H, which
is an alias of "xchg %ax, %ax"), however it has been supported since the Pentium Pro, and is all but
guaranteed to be supported on AMD64 because of the requirements of features like SSE2 that arrived in the
Pentium III era. Is it worth updating the longer byte sequences to use the 5-to-9-byte sequences for a
very minor performance boost and reduction in file entropy (the 00s will be easier to compress since they
generally appear more frequently in the entirety of the binary)?
Yours faithfully,
J. Gareth Moreton
More information about the fpc-devel
mailing list