[fpc-devel] Policy on platform-specific compiler code
J. Gareth Moreton
gareth at moreton-family.com
Sat Oct 17 00:14:20 CEST 2020
On 16/10/2020 10:47, Jonas Maebe via fpc-devel wrote:
> On 16/10/2020 10:14, J. Gareth Moreton via fpc-devel wrote:
>> Before I go optimising the wrong thing, I have a question to ask.
>> What's the policy on platform-specific assembly language in the
>> compiler, or any code designed to run on a specific (source) platform
>> (and using a more generic implementation otherwise via $ifdef)? I ask
>> because I have a faster algorithm for "calc_divconst_magic_unsigned" in
>> 'compiler/cgutils.pas', but it's only able to work because it can take
>> advantage of the x86 DIV instruction using RDX:RAX (or EDX:EAX) as a
>> double-wide dividend. It is somewhat faster than what currently exists
>> because of the lack of a loop whose iteration count is proportional to
>> log2(d), where d is the desired divisor (in other words, it's slower the
>> bigger the divisor is, whereas my algorithm is constant speed).
> In general, there should be no assembly language in the compiler. Ialso
> don't think that's worth it in this case. Unless (or maybe "even if")
> your code contains nothing but divisions by constants, I doubt this code
> has a significant effect on the total compile time.
Division by constants has a fairly frequent occurrance in code. For
example, dividing by 10000 whenever Currency is used, and 1000 often
appears in timing measurements (e.g. if t is in milliseconds, then t div
1000 is seconds and t mod 1000 is the leftover milliseconds).
The existing code contains two divisions by a variable (so they can't be
optimised) and a loop that has, at most, N iterations, where N is the
bit size (often 32 or 64). The loop contains only addition, subtraction
and multiplication, and 3 branches to contend with (not including the
repeat...until jump). My code contains a single DIV, but also a BSR
which is effectively used to get the base-2 logarithm of the divisor
(also throws an internal error if the divisor is zero, since this should
have been caught already).
Granted, you may be right and the saving won't be worth it, not to
mention the additional complexity (and my function currently fails on
certain divisors unexpectedly, so I'll have to do some deeper testing if
just for my own peace of mind!) - only a lot of timing tests will
determine that. Nevertheless, thanks for providing the article to
calculating the reciprocal though - that's definitely helpful in
understanding what's going on.
Gareth aka. Kit
More information about the fpc-devel