[fpc-devel] Optimization of redundant mov's

Jonas Maebe jonas at freepascal.org
Mon Mar 20 08:50:42 CET 2017


On 19/03/17 21:28, Martok wrote:
>
>> It's called register spilling: once there are no registers left to hold
>> values, the compiler has to pick registers whose value will be kept in
>> memory instead.
> I thought it would be something like that...
>
> Still, my main issue was with the repeated fetches. I'd (naively!) say that it
> should be relatively easy for an assembly-level optimizer to detect that these
> are repeated loads of the same thing, with nothing that could affect the outcome
> inbetween. It's not even a CSE in the technical sense, not a sub-expression but
> the entire thing...

It is trivial to create a peephole optimization for that particular 
pattern. At least if it's just two loads, because after you've optimized 
the second load into a register move, the third load no longer fits the 
pattern... Unless you create a special peephole optimizer pass that goes 
over the code backwards to apply this specific optimization, or you 
first match the pattern as many times as possible before changing it. 
But then it will still fail if there is at least one other instruction 
in between.

So then you have to slightly generalise it, and in the end you do end up 
with a full-blown assembler CSE optimizer, like the one we removed for 
3.0. I'm a staunch believer in not wasting time on stuff like that, it's 
just not worth it. Especially since a better register allocator, or SSA, 
can probably achieve the same thing in this case.

>> E.g. those memory loads
>> are probably optimised by the processor itself (not necessarily coming
>> even from the L1 cache, but possibly from the write-back buffer).
> Not as well as one might believe, manually fixing (by forcing @CurrentHash into
> a register with a local variable) just those 4 lines gives a ~2% increase in
> MB/s for this hash. Which is quite a lot, given this is the part *without*
> actual computations.

You cannot attribute those 2% exclusively to keeping the values in 
registers. E.g. removing them can change branch target alignments. Even 
adding random nops can get you 10% due to changed code layout.

> And again, I've seen this happen more than once on i386 code, where it even
> creates "fake" register pressure (by using 2 or more registers to hold exactly
> the same temporary)

That's again something that needs to be solved at the register allocator 
level (with SSA). Freeing up registers anymore afterwards is useless, 
since only the register allocator can keep stuff in them permanently.


Jonas



More information about the fpc-devel mailing list