[fpc-devel] Register deallocation

J. Gareth Moreton gareth at moreton-family.com
Fri Oct 23 02:10:19 CEST 2020

Hi everyone,

So I've been investigating a new optimisation, using Florian's 
GetNextInstructionUsingRegTrackingUse method, that improves upon 
removing MOV instructions and the like that write to registers whose 
values are never used (usually because the subroutine exits soon after), 
and a few times it even eliminates a register completely from a 
subroutine (theoretically it means I can remove the "push/pop" pair and 
SEH directives for that register).

I have run into one problem though, and I haven't been able to solve it 
yet (although I have one idea that I'll investigate when I'm less 
tired).  It seems in some rare circumstances, volatile registers aren't 
deallocated properly after a "call" instruction. One that stands out is 
the "fpc_ansistr_concat_multi" routine in the Win64 version of the 
System unit (search for ".section .text.n_fpc_ansistr_concat_multi" in 
the assembler dump "system.s")... it inserts a call to 
"fpc_unicodestr_assign" but doesn't free the volatile registers until 
much later.  When compiling under -O4,  the result is 
"fpc_unicodestr_assign" is called, then %edx is ALLOCATED (which is then 
removed because the tracked registers show %rdx is already assigned), 
and then %edx is used for a temporary storage (and these instructions 
are removed by the peephole optimizer via "MovMov2Mov 3").  The problem 
is, because of the register tracking and how 
GetNextInstructionUsingRegTrackingUse works, it now looks like the "xorq 
%rdx,%rdx" instruction prior to "fpc_unicodestr_assign" is a dead store, 
since the value of %rdx is completely overwritten by the commands 
following the "call" instruction and nothing in the tracking information 
indicates that this is a new allocation. This of course is false because 
%rdx contains one of fpc_unicodestr_assign's parameters.

So far it's not causing problems with the peephole optimizer as is, but 
it's causing an annoying block in my optimisation work and is kind of 
incorrect in regards to register tracking. Additionally, because all of 
the volatile registers are maked as 'in use' until many instructions 
later, the register allocator is forced to use a non-volatile register 
(%ebx in this case)  To summarise, the volatile registers aren't being 
deallocated immediately after "call fpc_unicodestr_assign" in the 
"fpc_ansistr_concat_multi" subroutine, and is blocking potential new 
optimisations and .is a source of minor inefficiencies.

If anyone has any answers or insight to this anomaly, I would be most 
grateful.  Thank you.

Gareth aka. Kit

P.S. To compile the system unit and get the assembler dump as I 
describe, build the RTL under Win64 with "make clean all FPC=(freshly 
built FPC binary) OPT="-O4 -a -ar".  You might want to build FPC with 
the "-dDEBUG_OPTALLOC" definition as well for extra information in the 

This email has been checked for viruses by Avast antivirus software.

More information about the fpc-devel mailing list