[fpc-devel] A minor stalling problem on i386 and x86-64

J. Gareth Moreton gareth at moreton-family.com
Thu Jan 9 13:19:07 CET 2020


Hi everyone,

Partially from reading around the Internet and seeing the occasional things that GCC does, I've realised that 
there's a bit of a problem when it comes to false dependencies on CPU registers, and that's when, say, a 32-
bit register is allocated, used, deallocated and then a smaller part of it, say, 16 bits, is used later.

e.g.

// allocate %eax
xorl %eax,%eax
...
movl %eax,(mem1)
// deallocate %eax
// allocate %ax
movw $2000,%ax
...

The issue here is that there's an expensive partial write penalty on the "movw" instruction because the upper 
16 bits of %eax have to be preserved (and may be non-zero).  The way to mitigate this is to expand the "movw" 
instruction to a 32-bit "movl" instruction or use "movzwl" when dealing with other registers or memory reads.  
That way, the entire register is being overwritten and the CPU's register renaming scheme can be fully 
utilised.

The thing is, there is no straightforward way to do this other than to force all writes to be at least 32-bit, 
which is horribly wasteful.  One way around it is to watch the tai_regalloc entries more carefully to see 
exactly which sub-register is in use, but there are a few instances where register allocation isn't done 
properly, or the full 32-bit register is allocated.  I'm thinking up ideas as to how one can approach this 
problem.

Gareth aka. Kit


More information about the fpc-devel mailing list