[fpc-devel] Optimization of redundant mov's
Martok
listbox at martoks-place.de
Sat Mar 18 23:32:55 CET 2017
Hi all,
there has been some discussion about FPCs optimizer in #31444, prompting me to
investigate some of my own code. Generally speaking the generated assembler is
not all that bad (I like how it uses LEA for almost all integer arithmetics),
but I keep seeing sections with lots of redundant MOVs.
Example, from a SHA512 implementation:
CurrentHash is a field of the current class, compiled with anything above -O2,
-CpCOREAVX2, -Px86_64.
a:= CurrentHash[0]; b:= CurrentHash[1]; c:= CurrentHash[2]; d:= CurrentHash[3];
0000000100074943 488b8424a0020000 mov 0x2a0(%rsp),%rax
000000010007494B 4c8b5038 mov 0x38(%rax),%r10
000000010007494F 488b8424a0020000 mov 0x2a0(%rsp),%rax
0000000100074957 4c8b5840 mov 0x40(%rax),%r11
000000010007495B 488b9424a0020000 mov 0x2a0(%rsp),%rdx
0000000100074963 488b4248 mov 0x48(%rdx),%rax
0000000100074967 488b9424a0020000 mov 0x2a0(%rsp),%rdx
000000010007496F 488b6a50 mov 0x50(%rdx),%rbp
Every single one of the "mov 0x2a0(%rsp), %rxx" instructions except the first is
redundant and causes another memory round-trip. At the same time, more registers
are used, which probably makes other optimizations more difficult, especially
when something similar happens on i386.
Now, the fun part: I haven't been able to build a simple test that causes the
same issue (the self-pointer already is in %rcx and not fetched from the stack
each time), so I have a feeling this may be a side effect of some other part of
the code.
Does this sound familiar to anyone? If so, what could I do about it?
Regards,
Martok
More information about the fpc-devel
mailing list