[fpc-devel] Discussion on a particular optimisation development (WARNING: Technical!)
Sven Barth
pascaldragon at googlemail.com
Wed Feb 3 21:23:26 CET 2021
Am 03.02.2021 um 20:25 schrieb Florian Klämpfl via fpc-devel:
> Am 02.02.21 um 22:06 schrieb J. Gareth Moreton via fpc-devel:
>> Hi everyone,
>>
>> I've found a potential optimisation for conditions of the form "(x <>
>> 0) and (y <> 0)", which are very common because this is semantically
>> equivalent to "Assigned(x) and Assigned(y)", for example, and such a
>> construct is generated implicity in TObject.Destroy, for example.
>> The node tree for TObject.Destroy example (after the first pass) is
>> as follows (it checks the value of Self and VMT):
>>
>> The node tree inside the if-node's condition branch is as follows:
>>
>> <andn resultdef="Boolean" pos="329,9" flags="nf_pass1_done"
>> complexity="3">
>> <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"
>> complexity="1">
>> <loadn resultdef="TObject" pos="329,9" flags="nf_pass1_done"
>> complexity="0">
>> <symbol>self</symbol>
>> </loadn>
>> <niln resultdef="TObject" pos="329,9" flags="nf_pass1_done"
>> complexity="0">
>> </niln>
>> </unequaln>
>> <unequaln resultdef="Boolean" pos="329,9" flags="nf_pass1_done"
>> complexity="1">
>> <typeconvn resultdef="$char_pointer" pos="329,9"
>> flags="nf_pass1_done,nf_explicit,nf_internal" complexity="0"
>> convtype="tc_equal">
>> <typeconvn resultdef="Pointer" pos="329,9"
>> flags="nf_pass1_done" complexity="0" convtype="tc_equal">
>> <loadn resultdef="<no type symbol>" pos="329,9"
>> flags="nf_pass1_done" complexity="0">
>> <symbol>vmt</symbol>
>> </loadn>
>> </typeconvn>
>> </typeconvn>
>> <pointerconstn resultdef="$char_pointer" pos="329,9"
>> flags="nf_pass1_done,nf_explicit" complexity="0">
>> <value>$0000000000000000</value>
>> </pointerconstn>
>> </unequaln>
>> </andn>
>>
>> On x86-64_win64, the following assembly language is generated (smart
>> pipelining and out-of-order execution MIGHT permit it to be executed
>> in 3 cycles, but it's more likely going to take 4 or 5):
>>
>> testq %rbx,%rbx
>> setneb %al
>> testq %rsi,%rsi
>> setneb %dl
>> andb %dl,%al
>>
>> DeMorgan's Laws for 2 inputs state that "not (A or B) = not A and not
>> B", and "not (A and B) = not A or not B", and using these, we can
>> develop much more efficient assembly language (which will almost
>> certainly take 3 cycles to run, and is much smaller):
>>
>> movq %rbx,%rdx
>> orq %rsi,%rdx
>> seteb %al
>>
>> For this particular routine, %rsi and %dl/%rdx are not used
>> afterwards, and can be simplified further (this bit will probably be
>> a peephole optimisation), dropping the cycle count to 2:
>>
>> orq %rbx,%rsi
>> seteb %al
>
> I am not sure how this is equal:
>
> consider rbx=1, esi=1: current code results in al=1; the last sample
> results in al=0. Or do I miss something?
Due to DeMorgan's Laws (not A and not B = not (A or B)) I'd say that
this should be "setneb %al".
Regards,
Sven
More information about the fpc-devel
mailing list