[fpc-devel] The "magic div" algorithm

J. Gareth Moreton gareth at moreton-family.com
Fri Sep 3 17:35:24 CEST 2021


Hey Marģers,

So I've been experimenting with your suggestion, and it looks like a 
resounding success!  I added some new tests to the "bdiv" bench test to 
see how it performs.  16-bit multiplications don't get improved as well 
as they could be on x86_64 because the intermediate values are all 
extended to 32-bit, but you can still see a massive improvement on all 
the unsigned divisions by 7.

I'm currently running the test suite for i386-win32 and x86_64-win64, 
then will implement the same system for AArch64. Thanks so much for 
spotting the improvement.

Gareth aka. Kit

P.S. I still need to work on the signed modulus at some point.

P.P.S. Timings:

Trunk:

Division compilation and timing test (using constants from System and 
Sysutils)
-------------------------------------------------------------------------------
               Unsigned 16-bit division by 1 - Pass - average iteration 
duration: 0.349 ns
                Unsigned 16-bit modulus by 1 - Pass - average iteration 
duration: 0.349 ns
               Unsigned 16-bit division by 2 - Pass - average iteration 
duration: 0.349 ns
                Unsigned 16-bit modulus by 2 - Pass - average iteration 
duration: 0.349 ns
               Unsigned 16-bit division by 3 - Pass - average iteration 
duration: 0.640 ns
                Unsigned 16-bit modulus by 3 - Pass - average iteration 
duration: 0.757 ns
               Unsigned 16-bit division by 7 - Pass - average iteration 
duration: 0.873 ns
                Unsigned 16-bit modulus by 7 - Pass - average iteration 
duration: 0.990 ns
              Unsigned 16-bit division by 10 - Pass - average iteration 
duration: 0.582 ns
               Unsigned 16-bit modulus by 10 - Pass - average iteration 
duration: 0.815 ns
             Unsigned 16-bit division by 100 - Pass - average iteration 
duration: 0.582 ns
              Unsigned 16-bit modulus by 100 - Pass - average iteration 
duration: 0.815 ns
           Unsigned 16-bit division by 1,000 - Pass - average iteration 
duration: 0.582 ns
            Unsigned 16-bit modulus by 1,000 - Pass - average iteration 
duration: 0.757 ns
               Unsigned 32-bit division by 1 - Pass - average iteration 
duration: 0.291 ns
                Unsigned 32-bit modulus by 1 - Pass - average iteration 
duration: 0.349 ns
               Unsigned 32-bit division by 2 - Pass - average iteration 
duration: 0.349 ns
                Unsigned 32-bit modulus by 2 - Pass - average iteration 
duration: 0.291 ns
               Unsigned 32-bit division by 3 - Pass - average iteration 
duration: 0.582 ns
                Unsigned 32-bit modulus by 3 - Pass - average iteration 
duration: 0.698 ns
               Unsigned 32-bit division by 7 - Pass - average iteration 
duration: 0.815 ns
                Unsigned 32-bit modulus by 7 - Pass - average iteration 
duration: 0.990 ns
              Unsigned 32-bit division by 10 - Pass - average iteration 
duration: 0.582 ns
               Unsigned 32-bit modulus by 10 - Pass - average iteration 
duration: 0.757 ns
             Unsigned 32-bit division by 100 - Pass - average iteration 
duration: 0.582 ns
              Unsigned 32-bit modulus by 100 - Pass - average iteration 
duration: 0.698 ns
           Unsigned 32-bit division by 1,000 - Pass - average iteration 
duration: 0.524 ns
            Unsigned 32-bit modulus by 1,000 - Pass - average iteration 
duration: 0.757 ns
          Unsigned 32-bit division by 60,000 - Pass - average iteration 
duration: 0.524 ns
           Unsigned 32-bit modulus by 60,000 - Pass - average iteration 
duration: 0.698 ns
         Unsigned 32-bit division by 146,097 - Pass - average iteration 
duration: 0.582 ns
          Unsigned 32-bit modulus by 146,097 - Pass - average iteration 
duration: 0.698 ns
       Unsigned 32-bit division by 3,600,000 - Pass - average iteration 
duration: 0.582 ns
        Unsigned 32-bit modulus by 3,600,000 - Pass - average iteration 
duration: 0.698 ns
               Unsigned 64-bit division by 1 - Pass - average iteration 
duration: 0.349 ns
                Unsigned 64-bit modulus by 1 - Pass - average iteration 
duration: 0.349 ns
               Unsigned 64-bit division by 2 - Pass - average iteration 
duration: 0.349 ns
                Unsigned 64-bit modulus by 2 - Pass - average iteration 
duration: 0.349 ns
               Unsigned 64-bit division by 3 - Pass - average iteration 
duration: 0.582 ns
                Unsigned 64-bit modulus by 3 - Pass - average iteration 
duration: 0.698 ns
               Unsigned 64-bit division by 7 - Pass - average iteration 
duration: 0.873 ns
                Unsigned 64-bit modulus by 7 - Pass - average iteration 
duration: 0.990 ns
              Unsigned 64-bit division by 10 - Pass - average iteration 
duration: 0.524 ns
               Unsigned 64-bit modulus by 10 - Pass - average iteration 
duration: 0.698 ns
             Unsigned 64-bit division by 100 - Pass - average iteration 
duration: 0.815 ns
              Unsigned 64-bit modulus by 100 - Pass - average iteration 
duration: 0.990 ns
   Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration 
duration: 0.815 ns
    Unsigned 64-bit modulus by 1,000,000,000 - Pass - average iteration 
duration: 0.990 ns
                 Signed 32-bit division by 1 - Pass - average iteration 
duration: 0.349 ns
                  Signed 32-bit modulus by 1 - Pass - average iteration 
duration: 0.291 ns
               Signed 32-bit division by 100 - Pass - average iteration 
duration: 0.698 ns
                Signed 32-bit modulus by 100 - Pass - average iteration 
duration: 6.403 ns
                 Signed 64-bit division by 1 - Pass - average iteration 
duration: 0.349 ns
                  Signed 64-bit modulus by 1 - Pass - average iteration 
duration: 0.349 ns
                Signed 64-bit division by 10 - Pass - average iteration 
duration: 0.582 ns
                 Signed 64-bit modulus by 10 - Pass - average iteration 
duration: 6.577 ns
                Signed 64-bit division by 18 - Pass - average iteration 
duration: 0.582 ns
                 Signed 64-bit modulus by 18 - Pass - average iteration 
duration: 6.461 ns
                Signed 64-bit division by 24 - Pass - average iteration 
duration: 0.640 ns
                 Signed 64-bit modulus by 24 - Pass - average iteration 
duration: 6.519 ns
               Signed 64-bit division by 100 - Pass - average iteration 
duration: 0.757 ns
                Signed 64-bit modulus by 100 - Pass - average iteration 
duration: 6.461 ns
               Signed 64-bit division by 153 - Pass - average iteration 
duration: 0.640 ns
                Signed 64-bit modulus by 153 - Pass - average iteration 
duration: 6.519 ns
             Signed 64-bit division by 1,461 - Pass - average iteration 
duration: 0.698 ns
              Signed 64-bit modulus by 1,461 - Pass - average iteration 
duration: 6.577 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration 
duration: 0.640 ns
  Signed 64-bit modulus by 10,000 (Currency) - Pass - average iteration 
duration: 6.694 ns
        Signed 64-bit division by 86,400,000 - Pass - average iteration 
duration: 0.640 ns
         Signed 64-bit modulus by 86,400,000 - Pass - average iteration 
duration: 6.636 ns

ok
- Sum of average durations: 96.275 ns
- Overall average duration: 1.375 ns

Extended magic-div:

Division compilation and timing test (using constants from System and 
Sysutils)
-------------------------------------------------------------------------------
               Unsigned 16-bit division by 1 - Pass - average iteration 
duration: 0.349 ns
                Unsigned 16-bit modulus by 1 - Pass - average iteration 
duration: 0.349 ns
               Unsigned 16-bit division by 2 - Pass - average iteration 
duration: 0.349 ns
                Unsigned 16-bit modulus by 2 - Pass - average iteration 
duration: 0.291 ns
               Unsigned 16-bit division by 3 - Pass - average iteration 
duration: 0.524 ns
                Unsigned 16-bit modulus by 3 - Pass - average iteration 
duration: 0.640 ns
               Unsigned 16-bit division by 7 - Pass - average iteration 
duration: 0.466 ns
                Unsigned 16-bit modulus by 7 - Pass - average iteration 
duration: 0.640 ns
              Unsigned 16-bit division by 10 - Pass - average iteration 
duration: 0.582 ns
               Unsigned 16-bit modulus by 10 - Pass - average iteration 
duration: 0.698 ns
             Unsigned 16-bit division by 100 - Pass - average iteration 
duration: 0.524 ns
              Unsigned 16-bit modulus by 100 - Pass - average iteration 
duration: 0.582 ns
           Unsigned 16-bit division by 1,000 - Pass - average iteration 
duration: 0.524 ns
            Unsigned 16-bit modulus by 1,000 - Pass - average iteration 
duration: 0.640 ns
               Unsigned 32-bit division by 1 - Pass - average iteration 
duration: 0.349 ns
                Unsigned 32-bit modulus by 1 - Pass - average iteration 
duration: 0.349 ns
               Unsigned 32-bit division by 2 - Pass - average iteration 
duration: 0.349 ns
                Unsigned 32-bit modulus by 2 - Pass - average iteration 
duration: 0.291 ns
               Unsigned 32-bit division by 3 - Pass - average iteration 
duration: 0.466 ns
                Unsigned 32-bit modulus by 3 - Pass - average iteration 
duration: 0.582 ns
               Unsigned 32-bit division by 7 - Pass - average iteration 
duration: 0.524 ns
                Unsigned 32-bit modulus by 7 - Pass - average iteration 
duration: 0.582 ns
              Unsigned 32-bit division by 10 - Pass - average iteration 
duration: 0.524 ns
               Unsigned 32-bit modulus by 10 - Pass - average iteration 
duration: 0.640 ns
             Unsigned 32-bit division by 100 - Pass - average iteration 
duration: 0.524 ns
              Unsigned 32-bit modulus by 100 - Pass - average iteration 
duration: 0.582 ns
           Unsigned 32-bit division by 1,000 - Pass - average iteration 
duration: 0.466 ns
            Unsigned 32-bit modulus by 1,000 - Pass - average iteration 
duration: 0.582 ns
          Unsigned 32-bit division by 60,000 - Pass - average iteration 
duration: 0.524 ns
           Unsigned 32-bit modulus by 60,000 - Pass - average iteration 
duration: 0.582 ns
         Unsigned 32-bit division by 146,097 - Pass - average iteration 
duration: 0.524 ns
          Unsigned 32-bit modulus by 146,097 - Pass - average iteration 
duration: 0.524 ns
       Unsigned 32-bit division by 3,600,000 - Pass - average iteration 
duration: 0.524 ns
        Unsigned 32-bit modulus by 3,600,000 - Pass - average iteration 
duration: 0.524 ns
               Unsigned 64-bit division by 1 - Pass - average iteration 
duration: 0.291 ns
                Unsigned 64-bit modulus by 1 - Pass - average iteration 
duration: 0.349 ns
               Unsigned 64-bit division by 2 - Pass - average iteration 
duration: 0.349 ns
                Unsigned 64-bit modulus by 2 - Pass - average iteration 
duration: 0.407 ns
               Unsigned 64-bit division by 3 - Pass - average iteration 
duration: 0.582 ns
                Unsigned 64-bit modulus by 3 - Pass - average iteration 
duration: 0.698 ns
               Unsigned 64-bit division by 7 - Pass - average iteration 
duration: 0.815 ns
                Unsigned 64-bit modulus by 7 - Pass - average iteration 
duration: 1.048 ns
              Unsigned 64-bit division by 10 - Pass - average iteration 
duration: 0.582 ns
               Unsigned 64-bit modulus by 10 - Pass - average iteration 
duration: 0.640 ns
             Unsigned 64-bit division by 100 - Pass - average iteration 
duration: 0.815 ns
              Unsigned 64-bit modulus by 100 - Pass - average iteration 
duration: 0.990 ns
   Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration 
duration: 0.757 ns
    Unsigned 64-bit modulus by 1,000,000,000 - Pass - average iteration 
duration: 1.048 ns
                 Signed 32-bit division by 1 - Pass - average iteration 
duration: 0.349 ns
                  Signed 32-bit modulus by 1 - Pass - average iteration 
duration: 0.349 ns
               Signed 32-bit division by 100 - Pass - average iteration 
duration: 0.757 ns
                Signed 32-bit modulus by 100 - Pass - average iteration 
duration: 6.461 ns
                 Signed 64-bit division by 1 - Pass - average iteration 
duration: 0.349 ns
                  Signed 64-bit modulus by 1 - Pass - average iteration 
duration: 0.349 ns
                Signed 64-bit division by 10 - Pass - average iteration 
duration: 0.640 ns
                 Signed 64-bit modulus by 10 - Pass - average iteration 
duration: 6.694 ns
                Signed 64-bit division by 18 - Pass - average iteration 
duration: 0.698 ns
                 Signed 64-bit modulus by 18 - Pass - average iteration 
duration: 6.636 ns
                Signed 64-bit division by 24 - Pass - average iteration 
duration: 0.698 ns
                 Signed 64-bit modulus by 24 - Pass - average iteration 
duration: 6.461 ns
               Signed 64-bit division by 100 - Pass - average iteration 
duration: 0.757 ns
                Signed 64-bit modulus by 100 - Pass - average iteration 
duration: 6.519 ns
               Signed 64-bit division by 153 - Pass - average iteration 
duration: 0.640 ns
                Signed 64-bit modulus by 153 - Pass - average iteration 
duration: 6.577 ns
             Signed 64-bit division by 1,461 - Pass - average iteration 
duration: 0.698 ns
              Signed 64-bit modulus by 1,461 - Pass - average iteration 
duration: 6.694 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration 
duration: 0.698 ns
  Signed 64-bit modulus by 10,000 (Currency) - Pass - average iteration 
duration: 6.694 ns
        Signed 64-bit division by 86,400,000 - Pass - average iteration 
duration: 0.698 ns
         Signed 64-bit modulus by 86,400,000 - Pass - average iteration 
duration: 6.577 ns

ok
- Sum of average durations: 93.540 ns
- Overall average duration: 1.336 ns


-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



More information about the fpc-devel mailing list