[fpc-devel] Discussion on a particular optimisation development (WARNING: Technical!)

Florian Klämpfl florian at freepascal.org
Wed Feb 3 20:25:51 CET 2021


Am 02.02.21 um 22:06 schrieb J. Gareth Moreton via fpc-devel:
> Hi everyone,
> 
> I've found a potential optimisation for conditions of the form "(x <> 0) 
> and (y <> 0)", which are very common because this is semantically 
> equivalent to "Assigned(x) and Assigned(y)", for example, and such a 
> construct is generated implicity in TObject.Destroy, for example.  The 
> node tree for TObject.Destroy example (after the first pass) is as 
> follows (it checks the value of Self and VMT):
> 
> The node tree inside the if-node's condition branch is as follows:
> 
> <andn resultdef="Boolean" pos="329,9" flags="nf_pass1_done" complexity="3">
>     <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done" 
> complexity="1">
>        <loadn resultdef="TObject" pos="329,9" flags="nf_pass1_done" 
> complexity="0">
>           <symbol>self</symbol>
>        </loadn>
>        <niln resultdef="TObject" pos="329,9" flags="nf_pass1_done" 
> complexity="0">
>        </niln>
>     </unequaln>
>     <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done" 
> complexity="1">
>        <typeconvn resultdef="$char_pointer" pos="329,9" 
> flags="nf_pass1_done,nf_explicit,nf_internal" complexity="0" 
> convtype="tc_equal">
>           <typeconvn resultdef="Pointer" pos="329,9" 
> flags="nf_pass1_done" complexity="0" convtype="tc_equal">
>              <loadn resultdef="<no type symbol>" pos="329,9" 
> flags="nf_pass1_done" complexity="0">
>                 <symbol>vmt</symbol>
>              </loadn>
>           </typeconvn>
>        </typeconvn>
>        <pointerconstn resultdef="$char_pointer" pos="329,9" 
> flags="nf_pass1_done,nf_explicit" complexity="0">
>           <value>$0000000000000000</value>
>        </pointerconstn>
>     </unequaln>
> </andn>
> 
> On x86-64_win64, the following assembly language is generated (smart 
> pipelining and out-of-order execution MIGHT permit it to be executed in 
> 3 cycles, but it's more likely going to take 4 or 5):
> 
>      testq    %rbx,%rbx
>      setneb   %al
>      testq    %rsi,%rsi
>      setneb   %dl
>      andb     %dl,%al
> 
> DeMorgan's Laws for 2 inputs state that "not (A or B) = not A and not 
> B", and "not (A and B) = not A or not B", and using these, we can 
> develop much more efficient assembly language (which will almost 
> certainly take 3 cycles to run, and is much smaller):
> 
>      movq     %rbx,%rdx
>      orq      %rsi,%rdx
>      seteb    %al
> 
> For this particular routine, %rsi and %dl/%rdx are not used afterwards, 
> and can be simplified further (this bit will probably be a peephole 
> optimisation), dropping the cycle count to 2:
> 
>      orq      %rbx,%rsi
>      seteb    %al

I am not sure how this is equal:

consider rbx=1, esi=1: current code results in al=1; the last sample 
results in al=0. Or do I miss something?


More information about the fpc-devel mailing list