[fpc-devel] Policy on platform-specific compiler code

Sun Oct 18 02:13:34 CEST 2020

Well, I think you might be right on this one, Jonas!

I've tested my algorithm against the one used in the compiler. It's 5 
times faster when used with small divisors (so loop iterations are 
minimal)... but that amounts to about 15 nanoseconds compared to 75 
nanoseconds!  Additionally, it treats the "magic add" differently (the 
reason why it randomly failed).

Gareth aka. Kit

On 16/10/2020 23:14, J. Gareth Moreton via fpc-devel wrote:
> On 16/10/2020 10:47, Jonas Maebe via fpc-devel wrote:
>> On 16/10/2020 10:14, J. Gareth Moreton via fpc-devel wrote:
>>
>>> Before I go optimising the wrong thing, I have a question to ask.
>>> What's the policy on platform-specific assembly language in the
>>> compiler, or any code designed to run on a specific (source) platform
>>> (and using a more generic implementation otherwise via $ifdef)?  I ask
>>> because I have a faster algorithm for "calc_divconst_magic_unsigned" in
>>> 'compiler/cgutils.pas', but it's only able to work because it can take
>>> advantage of the x86 DIV instruction using RDX:RAX (or EDX:EAX) as a
>>> double-wide dividend. It is somewhat faster than what currently exists
>>> because of the lack of a loop whose iteration count is proportional to
>>> log2(d), where d is the desired divisor (in other words, it's slower 
>>> the
>>> bigger the divisor is, whereas my algorithm is constant speed).
>> In general, there should be no assembly language in the compiler. Ialso
>>   don't think that's worth it in this case. Unless (or maybe "even if")
>> your code contains nothing but divisions by constants, I doubt this code
>> has a significant effect on the total compile time.
>
> Division by constants has a fairly frequent occurrance in code. For 
> example, dividing by 10000 whenever Currency is used, and 1000 often 
> appears in timing measurements (e.g. if t is in milliseconds, then t 
> div 1000 is seconds and t mod 1000 is the leftover milliseconds).
>
> The existing code contains two divisions by a variable (so they can't 
> be optimised) and a loop that has, at most, N iterations, where N is 
> the bit size (often 32 or 64).  The loop contains only addition, 
> subtraction and multiplication, and 3 branches to contend with (not 
> including the repeat...until jump).  My code contains a single DIV, 
> but also a BSR which is effectively used to get the base-2 logarithm 
> of the divisor (also throws an internal error if the divisor is zero, 
> since this should have been caught already).
>
> Granted, you may be right and the saving won't be worth it, not to 
> mention the additional complexity (and my function currently fails on 
> certain divisors unexpectedly, so I'll have to do some deeper testing 
> if just for my own peace of mind!) - only a lot of timing tests will 
> determine that.  Nevertheless, thanks for providing the article to 
> calculating the reciprocal though - that's definitely helpful in 
> understanding what's going on.
>
> Gareth aka. Kit
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>

-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus