[fpc-devel] Discussion on a particular optimisation development (WARNING: Technical!)

Florian Klämpfl florian at freepascal.org
Wed Feb 3 21:48:53 CET 2021


Am 03.02.21 um 21:23 schrieb Sven Barth via fpc-devel:
> Am 03.02.2021 um 20:25 schrieb Florian Klämpfl via fpc-devel:
>> Am 02.02.21 um 22:06 schrieb J. Gareth Moreton via fpc-devel:
>>> Hi everyone,
>>>
>>> I've found a potential optimisation for conditions of the form "(x <> 
>>> 0) and (y <> 0)", which are very common because this is semantically 
>>> equivalent to "Assigned(x) and Assigned(y)", for example, and such a 
>>> construct is generated implicity in TObject.Destroy, for example. The 
>>> node tree for TObject.Destroy example (after the first pass) is as 
>>> follows (it checks the value of Self and VMT):
>>>
>>> The node tree inside the if-node's condition branch is as follows:
>>>
>>> <andn resultdef="Boolean" pos="329,9" flags="nf_pass1_done" 
>>> complexity="3">
>>>     <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done" 
>>> complexity="1">
>>>        <loadn resultdef="TObject" pos="329,9" flags="nf_pass1_done" 
>>> complexity="0">
>>>           <symbol>self</symbol>
>>>        </loadn>
>>>        <niln resultdef="TObject" pos="329,9" flags="nf_pass1_done" 
>>> complexity="0">
>>>        </niln>
>>>     </unequaln>
>>>     <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done" 
>>> complexity="1">
>>>        <typeconvn resultdef="$char_pointer" pos="329,9" 
>>> flags="nf_pass1_done,nf_explicit,nf_internal" complexity="0" 
>>> convtype="tc_equal">
>>>           <typeconvn resultdef="Pointer" pos="329,9" 
>>> flags="nf_pass1_done" complexity="0" convtype="tc_equal">
>>>              <loadn resultdef="<no type symbol>" pos="329,9" 
>>> flags="nf_pass1_done" complexity="0">
>>>                 <symbol>vmt</symbol>
>>>              </loadn>
>>>           </typeconvn>
>>>        </typeconvn>
>>>        <pointerconstn resultdef="$char_pointer" pos="329,9" 
>>> flags="nf_pass1_done,nf_explicit" complexity="0">
>>>           <value>$0000000000000000</value>
>>>        </pointerconstn>
>>>     </unequaln>
>>> </andn>
>>>
>>> On x86-64_win64, the following assembly language is generated (smart 
>>> pipelining and out-of-order execution MIGHT permit it to be executed 
>>> in 3 cycles, but it's more likely going to take 4 or 5):
>>>
>>>      testq    %rbx,%rbx
>>>      setneb   %al
>>>      testq    %rsi,%rsi
>>>      setneb   %dl
>>>      andb     %dl,%al
>>>
>>> DeMorgan's Laws for 2 inputs state that "not (A or B) = not A and not 
>>> B", and "not (A and B) = not A or not B", and using these, we can 
>>> develop much more efficient assembly language (which will almost 
>>> certainly take 3 cycles to run, and is much smaller):
>>>
>>>      movq     %rbx,%rdx
>>>      orq      %rsi,%rdx
>>>      seteb    %al
>>>
>>> For this particular routine, %rsi and %dl/%rdx are not used 
>>> afterwards, and can be simplified further (this bit will probably be 
>>> a peephole optimisation), dropping the cycle count to 2:
>>>
>>>      orq      %rbx,%rsi
>>>      seteb    %al
>>
>> I am not sure how this is equal:
>>
>> consider rbx=1, esi=1: current code results in al=1; the last sample 
>> results in al=0. Or do I miss something?
> 
> Due to DeMorgan's Laws (not A and not B = not (A or B)) I'd say that 
> this should be "setneb %al".

Yes, so we get "not((x=0) or (y=0))". But this does not translate to 
"not(x or y)".



More information about the fpc-devel mailing list