[fpc-devel] Discussion on a particular optimisation development (WARNING: Technical!)

Sven Barth pascaldragon at googlemail.com
Wed Feb 3 21:23:26 CET 2021


Am 03.02.2021 um 20:25 schrieb Florian Klämpfl via fpc-devel:
> Am 02.02.21 um 22:06 schrieb J. Gareth Moreton via fpc-devel:
>> Hi everyone,
>>
>> I've found a potential optimisation for conditions of the form "(x <> 
>> 0) and (y <> 0)", which are very common because this is semantically 
>> equivalent to "Assigned(x) and Assigned(y)", for example, and such a 
>> construct is generated implicity in TObject.Destroy, for example.  
>> The node tree for TObject.Destroy example (after the first pass) is 
>> as follows (it checks the value of Self and VMT):
>>
>> The node tree inside the if-node's condition branch is as follows:
>>
>> <andn resultdef="Boolean" pos="329,9" flags="nf_pass1_done" 
>> complexity="3">
>>     <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done" 
>> complexity="1">
>>        <loadn resultdef="TObject" pos="329,9" flags="nf_pass1_done" 
>> complexity="0">
>>           <symbol>self</symbol>
>>        </loadn>
>>        <niln resultdef="TObject" pos="329,9" flags="nf_pass1_done" 
>> complexity="0">
>>        </niln>
>>     </unequaln>
>>     <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done" 
>> complexity="1">
>>        <typeconvn resultdef="$char_pointer" pos="329,9" 
>> flags="nf_pass1_done,nf_explicit,nf_internal" complexity="0" 
>> convtype="tc_equal">
>>           <typeconvn resultdef="Pointer" pos="329,9" 
>> flags="nf_pass1_done" complexity="0" convtype="tc_equal">
>>              <loadn resultdef="<no type symbol>" pos="329,9" 
>> flags="nf_pass1_done" complexity="0">
>>                 <symbol>vmt</symbol>
>>              </loadn>
>>           </typeconvn>
>>        </typeconvn>
>>        <pointerconstn resultdef="$char_pointer" pos="329,9" 
>> flags="nf_pass1_done,nf_explicit" complexity="0">
>>           <value>$0000000000000000</value>
>>        </pointerconstn>
>>     </unequaln>
>> </andn>
>>
>> On x86-64_win64, the following assembly language is generated (smart 
>> pipelining and out-of-order execution MIGHT permit it to be executed 
>> in 3 cycles, but it's more likely going to take 4 or 5):
>>
>>      testq    %rbx,%rbx
>>      setneb   %al
>>      testq    %rsi,%rsi
>>      setneb   %dl
>>      andb     %dl,%al
>>
>> DeMorgan's Laws for 2 inputs state that "not (A or B) = not A and not 
>> B", and "not (A and B) = not A or not B", and using these, we can 
>> develop much more efficient assembly language (which will almost 
>> certainly take 3 cycles to run, and is much smaller):
>>
>>      movq     %rbx,%rdx
>>      orq      %rsi,%rdx
>>      seteb    %al
>>
>> For this particular routine, %rsi and %dl/%rdx are not used 
>> afterwards, and can be simplified further (this bit will probably be 
>> a peephole optimisation), dropping the cycle count to 2:
>>
>>      orq      %rbx,%rsi
>>      seteb    %al
>
> I am not sure how this is equal:
>
> consider rbx=1, esi=1: current code results in al=1; the last sample 
> results in al=0. Or do I miss something?

Due to DeMorgan's Laws (not A and not B = not (A or B)) I'd say that 
this should be "setneb %al".

Regards,
Sven


More information about the fpc-devel mailing list