[fpc-devel] Register deallocation
J. Gareth Moreton
gareth at moreton-family.com
Fri Oct 23 02:10:19 CEST 2020
Hi everyone,
So I've been investigating a new optimisation, using Florian's
GetNextInstructionUsingRegTrackingUse method, that improves upon
removing MOV instructions and the like that write to registers whose
values are never used (usually because the subroutine exits soon after),
and a few times it even eliminates a register completely from a
subroutine (theoretically it means I can remove the "push/pop" pair and
SEH directives for that register).
I have run into one problem though, and I haven't been able to solve it
yet (although I have one idea that I'll investigate when I'm less
tired). It seems in some rare circumstances, volatile registers aren't
deallocated properly after a "call" instruction. One that stands out is
the "fpc_ansistr_concat_multi" routine in the Win64 version of the
System unit (search for ".section .text.n_fpc_ansistr_concat_multi" in
the assembler dump "system.s")... it inserts a call to
"fpc_unicodestr_assign" but doesn't free the volatile registers until
much later. When compiling under -O4, the result is
"fpc_unicodestr_assign" is called, then %edx is ALLOCATED (which is then
removed because the tracked registers show %rdx is already assigned),
and then %edx is used for a temporary storage (and these instructions
are removed by the peephole optimizer via "MovMov2Mov 3"). The problem
is, because of the register tracking and how
GetNextInstructionUsingRegTrackingUse works, it now looks like the "xorq
%rdx,%rdx" instruction prior to "fpc_unicodestr_assign" is a dead store,
since the value of %rdx is completely overwritten by the commands
following the "call" instruction and nothing in the tracking information
indicates that this is a new allocation. This of course is false because
%rdx contains one of fpc_unicodestr_assign's parameters.
So far it's not causing problems with the peephole optimizer as is, but
it's causing an annoying block in my optimisation work and is kind of
incorrect in regards to register tracking. Additionally, because all of
the volatile registers are maked as 'in use' until many instructions
later, the register allocator is forced to use a non-volatile register
(%ebx in this case) To summarise, the volatile registers aren't being
deallocated immediately after "call fpc_unicodestr_assign" in the
"fpc_ansistr_concat_multi" subroutine, and is blocking potential new
optimisations and .is a source of minor inefficiencies.
If anyone has any answers or insight to this anomaly, I would be most
grateful. Thank you.
Gareth aka. Kit
P.S. To compile the system unit and get the assembler dump as I
describe, build the RTL under Win64 with "make clean all FPC=(freshly
built FPC binary) OPT="-O4 -a -ar". You might want to build FPC with
the "-dDEBUG_OPTALLOC" definition as well for extra information in the
dumps.
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list