[fpc-devel] Discussion on a particular optimisation development (WARNING: Technical!)
Florian Klämpfl
florian at freepascal.org
Wed Feb 3 21:48:53 CET 2021
Am 03.02.21 um 21:23 schrieb Sven Barth via fpc-devel:
> Am 03.02.2021 um 20:25 schrieb Florian Klämpfl via fpc-devel:
>> Am 02.02.21 um 22:06 schrieb J. Gareth Moreton via fpc-devel:
>>> Hi everyone,
>>>
>>> I've found a potential optimisation for conditions of the form "(x <>
>>> 0) and (y <> 0)", which are very common because this is semantically
>>> equivalent to "Assigned(x) and Assigned(y)", for example, and such a
>>> construct is generated implicity in TObject.Destroy, for example. The
>>> node tree for TObject.Destroy example (after the first pass) is as
>>> follows (it checks the value of Self and VMT):
>>>
>>> The node tree inside the if-node's condition branch is as follows:
>>>
>>> <andn resultdef="Boolean" pos="329,9" flags="nf_pass1_done"
>>> complexity="3">
>>> <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"
>>> complexity="1">
>>> <loadn resultdef="TObject" pos="329,9" flags="nf_pass1_done"
>>> complexity="0">
>>> <symbol>self</symbol>
>>> </loadn>
>>> <niln resultdef="TObject" pos="329,9" flags="nf_pass1_done"
>>> complexity="0">
>>> </niln>
>>> </unequaln>
>>> <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"
>>> complexity="1">
>>> <typeconvn resultdef="$char_pointer" pos="329,9"
>>> flags="nf_pass1_done,nf_explicit,nf_internal" complexity="0"
>>> convtype="tc_equal">
>>> <typeconvn resultdef="Pointer" pos="329,9"
>>> flags="nf_pass1_done" complexity="0" convtype="tc_equal">
>>> <loadn resultdef="<no type symbol>" pos="329,9"
>>> flags="nf_pass1_done" complexity="0">
>>> <symbol>vmt</symbol>
>>> </loadn>
>>> </typeconvn>
>>> </typeconvn>
>>> <pointerconstn resultdef="$char_pointer" pos="329,9"
>>> flags="nf_pass1_done,nf_explicit" complexity="0">
>>> <value>$0000000000000000</value>
>>> </pointerconstn>
>>> </unequaln>
>>> </andn>
>>>
>>> On x86-64_win64, the following assembly language is generated (smart
>>> pipelining and out-of-order execution MIGHT permit it to be executed
>>> in 3 cycles, but it's more likely going to take 4 or 5):
>>>
>>> testq %rbx,%rbx
>>> setneb %al
>>> testq %rsi,%rsi
>>> setneb %dl
>>> andb %dl,%al
>>>
>>> DeMorgan's Laws for 2 inputs state that "not (A or B) = not A and not
>>> B", and "not (A and B) = not A or not B", and using these, we can
>>> develop much more efficient assembly language (which will almost
>>> certainly take 3 cycles to run, and is much smaller):
>>>
>>> movq %rbx,%rdx
>>> orq %rsi,%rdx
>>> seteb %al
>>>
>>> For this particular routine, %rsi and %dl/%rdx are not used
>>> afterwards, and can be simplified further (this bit will probably be
>>> a peephole optimisation), dropping the cycle count to 2:
>>>
>>> orq %rbx,%rsi
>>> seteb %al
>>
>> I am not sure how this is equal:
>>
>> consider rbx=1, esi=1: current code results in al=1; the last sample
>> results in al=0. Or do I miss something?
>
> Due to DeMorgan's Laws (not A and not B = not (A or B)) I'd say that
> this should be "setneb %al".
Yes, so we get "not((x=0) or (y=0))". But this does not translate to
"not(x or y)".
More information about the fpc-devel
mailing list