[fpc-devel] Division nodes

Thu May 11 19:01:11 CEST 2023

P.S. I found the code that adds the conditional checks; it's 
"doremoveinttypeconvs" in the ncnv unit.  However, it's very unclear as 
to WHY it's doing it as there's no comments around the code block.

Kit

On 11/05/2023 15:39, J. Gareth Moreton via fpc-devel wrote:
> It does seem odd.  In a practical sense, the only time I can see -1 
> being a common input among other random numbers is if it's an error 
> value, in which case you would most likely do special handling rather 
> than pass it through a division operation.  With the slowdown that 
> comes from additional branch prediction, it just seems like 
> unnecessary fluff, but I need to double-check to see if there's a very 
> good reason behind their generation (if it's a platform-specific 
> problem, it should be moved to that platform's specific first pass)  
> Now I just need to find out where those nodes are generated - they're 
> proving elusive!
>
> Note that using constant divisors uses a different optimisation, so 
> this only applies to variable divisors.
>
> Kit
>
> On 11/05/2023 12:07, Stefan Glienke via fpc-devel wrote:
>> Looks like a rather disadvantageous way to avoid the idiv instruction 
>> because x div -1 = -x and x mod -1 = 0.
>>
>> I ran a quick benchmark doing a lot of integer divisions where 
>> sometimes (randomly) the divisor was -1. When the occurence was rare 
>> enough (~5%) the performance was not impacted, the higher the 
>> occurence of -1 was the slower it became to almost half as fast. Only 
>> when less than 5% of the divisors were *not* -1 the performance was 
>> better up to twice as fast when all divisors were -1. Of couse ymmv 
>> as it depends on the CPU and the branch predictor behavior but it 
>> shows that this "optimization" is hardly any good.
>>
>> I cannot think of a realistic case where 95% of your divisors are -1 
>> and you really need to save those few extra cycles of calling idiv.
>>
>>> On 11/05/2023 11:04 CEST J. Gareth Moreton via fpc-devel 
>>> <fpc-devel at lists.freepascal.org> wrote:
>>>
>>>   Hi everyone,
>>>
>>> I need to ask a question about how division nodes are set up (I'm
>>> looking at possible optimisation techniques).  I've written the
>>> following procedure:
>>>
>>> procedure DoDivMod(N, D: Integer; out Q, R: Integer);
>>> begin
>>>     Q := N div D;
>>>     R := N mod D;
>>> end;
>>>
>>> Fairly simple and to the point.  However, even before the first node
>>> pass, the following node tree is generated for an integer division
>>> operation:
>>>
>>> <statementn pos="24,10">
>>>      <ifn resultdef="$void" pos="24,10" flags="nf_internal">
>>>         <condition>
>>>            <equaln resultdef="Boolean" pos="24,10" flags="nf_internal">
>>>               <loadn resultdef="LongInt" pos="24,14">
>>>                  <symbol>D</symbol>
>>>               </loadn>
>>>               <ordconstn resultdef="LongInt" pos="24,10" 
>>> rangecheck="FALSE">
>>>                  <value>-1</value>
>>>               </ordconstn>
>>>            </equaln>
>>>         </condition>
>>>         <then>
>>>            <assignn resultdef="$void" pos="24,10" flags="nf_internal">
>>>               <temprefn resultdef="LongInt" pos="24,10" 
>>> flags="nf_write"
>>> id="$7C585E10">
>>>                  <typedef>LongInt</typedef>
>>> <tempflags>ti_may_be_in_reg</tempflags>
>>> <temptype>tt_persistent</temptype>
>>>               </temprefn>
>>>               <unaryminusn resultdef="LongInt" pos="24,10">
>>>                  <loadn resultdef="LongInt" pos="24,8">
>>>                     <symbol>N</symbol>
>>>                  </loadn>
>>>               </unaryminusn>
>>>            </assignn>
>>>         </then>
>>>         <else>
>>>            <assignn resultdef="$void" pos="24,10" flags="nf_internal">
>>>               <temprefn resultdef="LongInt" pos="24,10" 
>>> flags="nf_write"
>>> id="$7C585E10">
>>>                  <typedef>LongInt</typedef>
>>> <tempflags>ti_may_be_in_reg</tempflags>
>>> <temptype>tt_persistent</temptype>
>>>               </temprefn>
>>>               <divn resultdef="LongInt" pos="24,10">
>>>                  <loadn resultdef="LongInt" pos="24,8">
>>>                     <symbol>N</symbol>
>>>                  </loadn>
>>>                  <loadn resultdef="LongInt" pos="24,14">
>>>                     <symbol>D</symbol>
>>>                  </loadn>
>>>               </divn>
>>>            </assignn>
>>>         </else>
>>>      </ifn>
>>> </statementn>
>>>
>>> Something similar is made for "mod" as well.  I have to ask 
>>> though... is
>>> it really necessary to check to see if the divisor is -1 and have a
>>> distinct assignment for it?  It's a bit of a rare edge case that 
>>> usually
>>> just slows things down since it tends to add a comparison and a
>>> conditional jump to the final assembly language.  Is there some
>>> anomalous behaviour to a processor's division routine if the divisor 
>>> is -1?
>>>
>>> At the very least, would it be possible to remove the conditional check
>>> when compiling under -Os?
>>>
>>> (I intend to see if it's possible to merge "N div D" and "N mod D" on
>>> x86, and possibly other processors that have a combined DIV/MOD 
>>> operator).
>>>
>>> Kit
>>>
>>> _______________________________________________
>>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>> _______________________________________________
>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>