[fpc-devel] Division nodes

Thu May 11 20:41:58 CEST 2023

This is the code block in question (ncnv.pas, starting at line 3397) - 
if anyone can explain why it has to be set up this way, or add comments 
to the code, I will be most grateful (it's run for the following node 
types: subn, addn, muln, divn, modn, xorn, andn, orn, shln, shrn):

   exclude(n.flags,nf_internal);
   if not forceunsigned and
      is_signed(n.resultdef) then
     begin
       originaldivtree:=nil;
       if n.nodetype in [divn,modn] then
         originaldivtree:=n.getcopy;
doremoveinttypeconvs(level+1,tbinarynode(n).left,signedtype,false,signedtype,unsignedtype);
doremoveinttypeconvs(level+1,tbinarynode(n).right,signedtype,false,signedtype,unsignedtype);
       n.resultdef:=signedtype;
       if n.nodetype in [divn,modn] then
         begin
           newblock:=internalstatements(newstatements);
tempnode:=ctempcreatenode.create(n.resultdef,n.resultdef.size,tt_persistent,true);
           addstatement(newstatements,tempnode);
           addstatement(newstatements,cifnode.create_internal(
caddnode.create_internal(equaln,tbinarynode(n).right.getcopy,cordconstnode.create(-1,n.resultdef,false)),
               cassignmentnode.create_internal(
                 ctemprefnode.create(tempnode),
cmoddivnode.create(n.nodetype,tbinarynode(originaldivtree).left.getcopy,cordconstnode.create(-1,tbinarynode(originaldivtree).right.resultdef,false))
               ),
               cassignmentnode.create_internal(
                 ctemprefnode.create(tempnode),n
               )
             )
           );
addstatement(newstatements,ctempdeletenode.create_normal_temp(tempnode));
addstatement(newstatements,ctemprefnode.create(tempnode));
           n:=newblock;
           do_typecheckpass(n);
           originaldivtree.free;
         end;
     end

(the new division/modulus by -1 is then converted elsewhere)

Kit

On 11/05/2023 18:01, J. Gareth Moreton via fpc-devel wrote:
> P.S. I found the code that adds the conditional checks; it's 
> "doremoveinttypeconvs" in the ncnv unit.  However, it's very unclear 
> as to WHY it's doing it as there's no comments around the code block.
>
> Kit
>
> On 11/05/2023 15:39, J. Gareth Moreton via fpc-devel wrote:
>> It does seem odd.  In a practical sense, the only time I can see -1 
>> being a common input among other random numbers is if it's an error 
>> value, in which case you would most likely do special handling rather 
>> than pass it through a division operation.  With the slowdown that 
>> comes from additional branch prediction, it just seems like 
>> unnecessary fluff, but I need to double-check to see if there's a 
>> very good reason behind their generation (if it's a platform-specific 
>> problem, it should be moved to that platform's specific first pass)  
>> Now I just need to find out where those nodes are generated - they're 
>> proving elusive!
>>
>> Note that using constant divisors uses a different optimisation, so 
>> this only applies to variable divisors.
>>
>> Kit
>>
>> On 11/05/2023 12:07, Stefan Glienke via fpc-devel wrote:
>>> Looks like a rather disadvantageous way to avoid the idiv 
>>> instruction because x div -1 = -x and x mod -1 = 0.
>>>
>>> I ran a quick benchmark doing a lot of integer divisions where 
>>> sometimes (randomly) the divisor was -1. When the occurence was rare 
>>> enough (~5%) the performance was not impacted, the higher the 
>>> occurence of -1 was the slower it became to almost half as fast. 
>>> Only when less than 5% of the divisors were *not* -1 the performance 
>>> was better up to twice as fast when all divisors were -1. Of couse 
>>> ymmv as it depends on the CPU and the branch predictor behavior but 
>>> it shows that this "optimization" is hardly any good.
>>>
>>> I cannot think of a realistic case where 95% of your divisors are -1 
>>> and you really need to save those few extra cycles of calling idiv.
>>>
>>>> On 11/05/2023 11:04 CEST J. Gareth Moreton via fpc-devel 
>>>> <fpc-devel at lists.freepascal.org> wrote:
>>>>
>>>>   Hi everyone,
>>>>
>>>> I need to ask a question about how division nodes are set up (I'm
>>>> looking at possible optimisation techniques).  I've written the
>>>> following procedure:
>>>>
>>>> procedure DoDivMod(N, D: Integer; out Q, R: Integer);
>>>> begin
>>>>     Q := N div D;
>>>>     R := N mod D;
>>>> end;
>>>>
>>>> Fairly simple and to the point.  However, even before the first node
>>>> pass, the following node tree is generated for an integer division
>>>> operation:
>>>>
>>>> <statementn pos="24,10">
>>>>      <ifn resultdef="$void" pos="24,10" flags="nf_internal">
>>>>         <condition>
>>>>            <equaln resultdef="Boolean" pos="24,10" 
>>>> flags="nf_internal">
>>>>               <loadn resultdef="LongInt" pos="24,14">
>>>>                  <symbol>D</symbol>
>>>>               </loadn>
>>>>               <ordconstn resultdef="LongInt" pos="24,10" 
>>>> rangecheck="FALSE">
>>>>                  <value>-1</value>
>>>>               </ordconstn>
>>>>            </equaln>
>>>>         </condition>
>>>>         <then>
>>>>            <assignn resultdef="$void" pos="24,10" flags="nf_internal">
>>>>               <temprefn resultdef="LongInt" pos="24,10" 
>>>> flags="nf_write"
>>>> id="$7C585E10">
>>>>                  <typedef>LongInt</typedef>
>>>> <tempflags>ti_may_be_in_reg</tempflags>
>>>> <temptype>tt_persistent</temptype>
>>>>               </temprefn>
>>>>               <unaryminusn resultdef="LongInt" pos="24,10">
>>>>                  <loadn resultdef="LongInt" pos="24,8">
>>>>                     <symbol>N</symbol>
>>>>                  </loadn>
>>>>               </unaryminusn>
>>>>            </assignn>
>>>>         </then>
>>>>         <else>
>>>>            <assignn resultdef="$void" pos="24,10" flags="nf_internal">
>>>>               <temprefn resultdef="LongInt" pos="24,10" 
>>>> flags="nf_write"
>>>> id="$7C585E10">
>>>>                  <typedef>LongInt</typedef>
>>>> <tempflags>ti_may_be_in_reg</tempflags>
>>>> <temptype>tt_persistent</temptype>
>>>>               </temprefn>
>>>>               <divn resultdef="LongInt" pos="24,10">
>>>>                  <loadn resultdef="LongInt" pos="24,8">
>>>>                     <symbol>N</symbol>
>>>>                  </loadn>
>>>>                  <loadn resultdef="LongInt" pos="24,14">
>>>>                     <symbol>D</symbol>
>>>>                  </loadn>
>>>>               </divn>
>>>>            </assignn>
>>>>         </else>
>>>>      </ifn>
>>>> </statementn>
>>>>
>>>> Something similar is made for "mod" as well.  I have to ask 
>>>> though... is
>>>> it really necessary to check to see if the divisor is -1 and have a
>>>> distinct assignment for it?  It's a bit of a rare edge case that 
>>>> usually
>>>> just slows things down since it tends to add a comparison and a
>>>> conditional jump to the final assembly language.  Is there some
>>>> anomalous behaviour to a processor's division routine if the 
>>>> divisor is -1?
>>>>
>>>> At the very least, would it be possible to remove the conditional 
>>>> check
>>>> when compiling under -Os?
>>>>
>>>> (I intend to see if it's possible to merge "N div D" and "N mod D" on
>>>> x86, and possibly other processors that have a combined DIV/MOD 
>>>> operator).
>>>>
>>>> Kit
>>>>
>>>> _______________________________________________
>>>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>>>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>> _______________________________________________
>>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>>
>> _______________________________________________
>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>