[fpc-devel] A minor stalling problem on i386 and x86-64
J. Gareth Moreton
gareth at moreton-family.com
Thu Jan 9 13:19:07 CET 2020
Hi everyone,
Partially from reading around the Internet and seeing the occasional things that GCC does, I've realised that
there's a bit of a problem when it comes to false dependencies on CPU registers, and that's when, say, a 32-
bit register is allocated, used, deallocated and then a smaller part of it, say, 16 bits, is used later.
e.g.
// allocate %eax
xorl %eax,%eax
...
movl %eax,(mem1)
// deallocate %eax
// allocate %ax
movw $2000,%ax
...
The issue here is that there's an expensive partial write penalty on the "movw" instruction because the upper
16 bits of %eax have to be preserved (and may be non-zero). The way to mitigate this is to expand the "movw"
instruction to a 32-bit "movl" instruction or use "movzwl" when dealing with other registers or memory reads.
That way, the entire register is being overwritten and the CPU's register renaming scheme can be fully
utilised.
The thing is, there is no straightforward way to do this other than to force all writes to be at least 32-bit,
which is horribly wasteful. One way around it is to watch the tai_regalloc entries more carefully to see
exactly which sub-register is in use, but there are a few instances where register allocation isn't done
properly, or the full 32-bit register is allocated. I'm thinking up ideas as to how one can approach this
problem.
Gareth aka. Kit
More information about the fpc-devel
mailing list