[fpc-devel] Optimization for 'mod'

Fri Jan 5 16:05:17 CET 2018

I've used DivMod myself before, akin to something like this under Windows:

{$ASMMODE Intel}
function DivMod(N, D: QWord; var R: QWord): QWord; assembler; nostackframe;
asm
  MOV R9, RDX
  MOV RAX, RCX
  XOR RDX, RDX   { Zero RDX because the DIV operation uses RDX:RAX as the numerator }
  DIV R9         { Quotient is in RAX and remainder is in RDX }
  MOV [R8], RDX
end;

The drawback is the overhead of calling a procedure, and the fact it uses DIV, which can take upwards of 50 
cycles to execute for 64-bit operands. It's a very slow command, hence the 9-nanosecond metric that I 
measured compared to times under 2 nanoseconds if converted into a multiplication. True, you could write a 
Pascal variant and inline the function to avoid the expensive procedure call, but you're still using DIV.

If you know the divisor is a constant, then you can change the division into a multiplication by calculating 
a reciprocal value (along with an addition and bit shift to correct rounding errors), which is far faster to 
execute.  Also, most programmers won't think to use something like DivMod, or may not want to for sake of 
convenience, especially with something like the following:

WriteLn(Format('Time taken: %d:%.02d', [Seconds div 60, Seconds mod 60]));

Kit