[fpc-devel] Discussion on a particular optimisation development (WARNING: Technical!)
Florian Klämpfl
florian at freepascal.org
Wed Feb 3 20:25:51 CET 2021
Am 02.02.21 um 22:06 schrieb J. Gareth Moreton via fpc-devel:
> Hi everyone,
>
> I've found a potential optimisation for conditions of the form "(x <> 0)
> and (y <> 0)", which are very common because this is semantically
> equivalent to "Assigned(x) and Assigned(y)", for example, and such a
> construct is generated implicity in TObject.Destroy, for example. The
> node tree for TObject.Destroy example (after the first pass) is as
> follows (it checks the value of Self and VMT):
>
> The node tree inside the if-node's condition branch is as follows:
>
> <andn resultdef="Boolean" pos="329,9" flags="nf_pass1_done" complexity="3">
> <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"
> complexity="1">
> <loadn resultdef="TObject" pos="329,9" flags="nf_pass1_done"
> complexity="0">
> <symbol>self</symbol>
> </loadn>
> <niln resultdef="TObject" pos="329,9" flags="nf_pass1_done"
> complexity="0">
> </niln>
> </unequaln>
> <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"
> complexity="1">
> <typeconvn resultdef="$char_pointer" pos="329,9"
> flags="nf_pass1_done,nf_explicit,nf_internal" complexity="0"
> convtype="tc_equal">
> <typeconvn resultdef="Pointer" pos="329,9"
> flags="nf_pass1_done" complexity="0" convtype="tc_equal">
> <loadn resultdef="<no type symbol>" pos="329,9"
> flags="nf_pass1_done" complexity="0">
> <symbol>vmt</symbol>
> </loadn>
> </typeconvn>
> </typeconvn>
> <pointerconstn resultdef="$char_pointer" pos="329,9"
> flags="nf_pass1_done,nf_explicit" complexity="0">
> <value>$0000000000000000</value>
> </pointerconstn>
> </unequaln>
> </andn>
>
> On x86-64_win64, the following assembly language is generated (smart
> pipelining and out-of-order execution MIGHT permit it to be executed in
> 3 cycles, but it's more likely going to take 4 or 5):
>
> testq %rbx,%rbx
> setneb %al
> testq %rsi,%rsi
> setneb %dl
> andb %dl,%al
>
> DeMorgan's Laws for 2 inputs state that "not (A or B) = not A and not
> B", and "not (A and B) = not A or not B", and using these, we can
> develop much more efficient assembly language (which will almost
> certainly take 3 cycles to run, and is much smaller):
>
> movq %rbx,%rdx
> orq %rsi,%rdx
> seteb %al
>
> For this particular routine, %rsi and %dl/%rdx are not used afterwards,
> and can be simplified further (this bit will probably be a peephole
> optimisation), dropping the cycle count to 2:
>
> orq %rbx,%rsi
> seteb %al
I am not sure how this is equal:
consider rbx=1, esi=1: current code results in al=1; the last sample
results in al=0. Or do I miss something?
More information about the fpc-devel
mailing list