[fpc-devel] The "magic div" algorithm
J. Gareth Moreton
gareth at moreton-family.com
Fri Sep 3 17:35:24 CEST 2021
Hey Marģers,
So I've been experimenting with your suggestion, and it looks like a
resounding success! I added some new tests to the "bdiv" bench test to
see how it performs. 16-bit multiplications don't get improved as well
as they could be on x86_64 because the intermediate values are all
extended to 32-bit, but you can still see a massive improvement on all
the unsigned divisions by 7.
I'm currently running the test suite for i386-win32 and x86_64-win64,
then will implement the same system for AArch64. Thanks so much for
spotting the improvement.
Gareth aka. Kit
P.S. I still need to work on the signed modulus at some point.
P.P.S. Timings:
Trunk:
Division compilation and timing test (using constants from System and
Sysutils)
-------------------------------------------------------------------------------
Unsigned 16-bit division by 1 - Pass - average iteration
duration: 0.349 ns
Unsigned 16-bit modulus by 1 - Pass - average iteration
duration: 0.349 ns
Unsigned 16-bit division by 2 - Pass - average iteration
duration: 0.349 ns
Unsigned 16-bit modulus by 2 - Pass - average iteration
duration: 0.349 ns
Unsigned 16-bit division by 3 - Pass - average iteration
duration: 0.640 ns
Unsigned 16-bit modulus by 3 - Pass - average iteration
duration: 0.757 ns
Unsigned 16-bit division by 7 - Pass - average iteration
duration: 0.873 ns
Unsigned 16-bit modulus by 7 - Pass - average iteration
duration: 0.990 ns
Unsigned 16-bit division by 10 - Pass - average iteration
duration: 0.582 ns
Unsigned 16-bit modulus by 10 - Pass - average iteration
duration: 0.815 ns
Unsigned 16-bit division by 100 - Pass - average iteration
duration: 0.582 ns
Unsigned 16-bit modulus by 100 - Pass - average iteration
duration: 0.815 ns
Unsigned 16-bit division by 1,000 - Pass - average iteration
duration: 0.582 ns
Unsigned 16-bit modulus by 1,000 - Pass - average iteration
duration: 0.757 ns
Unsigned 32-bit division by 1 - Pass - average iteration
duration: 0.291 ns
Unsigned 32-bit modulus by 1 - Pass - average iteration
duration: 0.349 ns
Unsigned 32-bit division by 2 - Pass - average iteration
duration: 0.349 ns
Unsigned 32-bit modulus by 2 - Pass - average iteration
duration: 0.291 ns
Unsigned 32-bit division by 3 - Pass - average iteration
duration: 0.582 ns
Unsigned 32-bit modulus by 3 - Pass - average iteration
duration: 0.698 ns
Unsigned 32-bit division by 7 - Pass - average iteration
duration: 0.815 ns
Unsigned 32-bit modulus by 7 - Pass - average iteration
duration: 0.990 ns
Unsigned 32-bit division by 10 - Pass - average iteration
duration: 0.582 ns
Unsigned 32-bit modulus by 10 - Pass - average iteration
duration: 0.757 ns
Unsigned 32-bit division by 100 - Pass - average iteration
duration: 0.582 ns
Unsigned 32-bit modulus by 100 - Pass - average iteration
duration: 0.698 ns
Unsigned 32-bit division by 1,000 - Pass - average iteration
duration: 0.524 ns
Unsigned 32-bit modulus by 1,000 - Pass - average iteration
duration: 0.757 ns
Unsigned 32-bit division by 60,000 - Pass - average iteration
duration: 0.524 ns
Unsigned 32-bit modulus by 60,000 - Pass - average iteration
duration: 0.698 ns
Unsigned 32-bit division by 146,097 - Pass - average iteration
duration: 0.582 ns
Unsigned 32-bit modulus by 146,097 - Pass - average iteration
duration: 0.698 ns
Unsigned 32-bit division by 3,600,000 - Pass - average iteration
duration: 0.582 ns
Unsigned 32-bit modulus by 3,600,000 - Pass - average iteration
duration: 0.698 ns
Unsigned 64-bit division by 1 - Pass - average iteration
duration: 0.349 ns
Unsigned 64-bit modulus by 1 - Pass - average iteration
duration: 0.349 ns
Unsigned 64-bit division by 2 - Pass - average iteration
duration: 0.349 ns
Unsigned 64-bit modulus by 2 - Pass - average iteration
duration: 0.349 ns
Unsigned 64-bit division by 3 - Pass - average iteration
duration: 0.582 ns
Unsigned 64-bit modulus by 3 - Pass - average iteration
duration: 0.698 ns
Unsigned 64-bit division by 7 - Pass - average iteration
duration: 0.873 ns
Unsigned 64-bit modulus by 7 - Pass - average iteration
duration: 0.990 ns
Unsigned 64-bit division by 10 - Pass - average iteration
duration: 0.524 ns
Unsigned 64-bit modulus by 10 - Pass - average iteration
duration: 0.698 ns
Unsigned 64-bit division by 100 - Pass - average iteration
duration: 0.815 ns
Unsigned 64-bit modulus by 100 - Pass - average iteration
duration: 0.990 ns
Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration
duration: 0.815 ns
Unsigned 64-bit modulus by 1,000,000,000 - Pass - average iteration
duration: 0.990 ns
Signed 32-bit division by 1 - Pass - average iteration
duration: 0.349 ns
Signed 32-bit modulus by 1 - Pass - average iteration
duration: 0.291 ns
Signed 32-bit division by 100 - Pass - average iteration
duration: 0.698 ns
Signed 32-bit modulus by 100 - Pass - average iteration
duration: 6.403 ns
Signed 64-bit division by 1 - Pass - average iteration
duration: 0.349 ns
Signed 64-bit modulus by 1 - Pass - average iteration
duration: 0.349 ns
Signed 64-bit division by 10 - Pass - average iteration
duration: 0.582 ns
Signed 64-bit modulus by 10 - Pass - average iteration
duration: 6.577 ns
Signed 64-bit division by 18 - Pass - average iteration
duration: 0.582 ns
Signed 64-bit modulus by 18 - Pass - average iteration
duration: 6.461 ns
Signed 64-bit division by 24 - Pass - average iteration
duration: 0.640 ns
Signed 64-bit modulus by 24 - Pass - average iteration
duration: 6.519 ns
Signed 64-bit division by 100 - Pass - average iteration
duration: 0.757 ns
Signed 64-bit modulus by 100 - Pass - average iteration
duration: 6.461 ns
Signed 64-bit division by 153 - Pass - average iteration
duration: 0.640 ns
Signed 64-bit modulus by 153 - Pass - average iteration
duration: 6.519 ns
Signed 64-bit division by 1,461 - Pass - average iteration
duration: 0.698 ns
Signed 64-bit modulus by 1,461 - Pass - average iteration
duration: 6.577 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration
duration: 0.640 ns
Signed 64-bit modulus by 10,000 (Currency) - Pass - average iteration
duration: 6.694 ns
Signed 64-bit division by 86,400,000 - Pass - average iteration
duration: 0.640 ns
Signed 64-bit modulus by 86,400,000 - Pass - average iteration
duration: 6.636 ns
ok
- Sum of average durations: 96.275 ns
- Overall average duration: 1.375 ns
Extended magic-div:
Division compilation and timing test (using constants from System and
Sysutils)
-------------------------------------------------------------------------------
Unsigned 16-bit division by 1 - Pass - average iteration
duration: 0.349 ns
Unsigned 16-bit modulus by 1 - Pass - average iteration
duration: 0.349 ns
Unsigned 16-bit division by 2 - Pass - average iteration
duration: 0.349 ns
Unsigned 16-bit modulus by 2 - Pass - average iteration
duration: 0.291 ns
Unsigned 16-bit division by 3 - Pass - average iteration
duration: 0.524 ns
Unsigned 16-bit modulus by 3 - Pass - average iteration
duration: 0.640 ns
Unsigned 16-bit division by 7 - Pass - average iteration
duration: 0.466 ns
Unsigned 16-bit modulus by 7 - Pass - average iteration
duration: 0.640 ns
Unsigned 16-bit division by 10 - Pass - average iteration
duration: 0.582 ns
Unsigned 16-bit modulus by 10 - Pass - average iteration
duration: 0.698 ns
Unsigned 16-bit division by 100 - Pass - average iteration
duration: 0.524 ns
Unsigned 16-bit modulus by 100 - Pass - average iteration
duration: 0.582 ns
Unsigned 16-bit division by 1,000 - Pass - average iteration
duration: 0.524 ns
Unsigned 16-bit modulus by 1,000 - Pass - average iteration
duration: 0.640 ns
Unsigned 32-bit division by 1 - Pass - average iteration
duration: 0.349 ns
Unsigned 32-bit modulus by 1 - Pass - average iteration
duration: 0.349 ns
Unsigned 32-bit division by 2 - Pass - average iteration
duration: 0.349 ns
Unsigned 32-bit modulus by 2 - Pass - average iteration
duration: 0.291 ns
Unsigned 32-bit division by 3 - Pass - average iteration
duration: 0.466 ns
Unsigned 32-bit modulus by 3 - Pass - average iteration
duration: 0.582 ns
Unsigned 32-bit division by 7 - Pass - average iteration
duration: 0.524 ns
Unsigned 32-bit modulus by 7 - Pass - average iteration
duration: 0.582 ns
Unsigned 32-bit division by 10 - Pass - average iteration
duration: 0.524 ns
Unsigned 32-bit modulus by 10 - Pass - average iteration
duration: 0.640 ns
Unsigned 32-bit division by 100 - Pass - average iteration
duration: 0.524 ns
Unsigned 32-bit modulus by 100 - Pass - average iteration
duration: 0.582 ns
Unsigned 32-bit division by 1,000 - Pass - average iteration
duration: 0.466 ns
Unsigned 32-bit modulus by 1,000 - Pass - average iteration
duration: 0.582 ns
Unsigned 32-bit division by 60,000 - Pass - average iteration
duration: 0.524 ns
Unsigned 32-bit modulus by 60,000 - Pass - average iteration
duration: 0.582 ns
Unsigned 32-bit division by 146,097 - Pass - average iteration
duration: 0.524 ns
Unsigned 32-bit modulus by 146,097 - Pass - average iteration
duration: 0.524 ns
Unsigned 32-bit division by 3,600,000 - Pass - average iteration
duration: 0.524 ns
Unsigned 32-bit modulus by 3,600,000 - Pass - average iteration
duration: 0.524 ns
Unsigned 64-bit division by 1 - Pass - average iteration
duration: 0.291 ns
Unsigned 64-bit modulus by 1 - Pass - average iteration
duration: 0.349 ns
Unsigned 64-bit division by 2 - Pass - average iteration
duration: 0.349 ns
Unsigned 64-bit modulus by 2 - Pass - average iteration
duration: 0.407 ns
Unsigned 64-bit division by 3 - Pass - average iteration
duration: 0.582 ns
Unsigned 64-bit modulus by 3 - Pass - average iteration
duration: 0.698 ns
Unsigned 64-bit division by 7 - Pass - average iteration
duration: 0.815 ns
Unsigned 64-bit modulus by 7 - Pass - average iteration
duration: 1.048 ns
Unsigned 64-bit division by 10 - Pass - average iteration
duration: 0.582 ns
Unsigned 64-bit modulus by 10 - Pass - average iteration
duration: 0.640 ns
Unsigned 64-bit division by 100 - Pass - average iteration
duration: 0.815 ns
Unsigned 64-bit modulus by 100 - Pass - average iteration
duration: 0.990 ns
Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration
duration: 0.757 ns
Unsigned 64-bit modulus by 1,000,000,000 - Pass - average iteration
duration: 1.048 ns
Signed 32-bit division by 1 - Pass - average iteration
duration: 0.349 ns
Signed 32-bit modulus by 1 - Pass - average iteration
duration: 0.349 ns
Signed 32-bit division by 100 - Pass - average iteration
duration: 0.757 ns
Signed 32-bit modulus by 100 - Pass - average iteration
duration: 6.461 ns
Signed 64-bit division by 1 - Pass - average iteration
duration: 0.349 ns
Signed 64-bit modulus by 1 - Pass - average iteration
duration: 0.349 ns
Signed 64-bit division by 10 - Pass - average iteration
duration: 0.640 ns
Signed 64-bit modulus by 10 - Pass - average iteration
duration: 6.694 ns
Signed 64-bit division by 18 - Pass - average iteration
duration: 0.698 ns
Signed 64-bit modulus by 18 - Pass - average iteration
duration: 6.636 ns
Signed 64-bit division by 24 - Pass - average iteration
duration: 0.698 ns
Signed 64-bit modulus by 24 - Pass - average iteration
duration: 6.461 ns
Signed 64-bit division by 100 - Pass - average iteration
duration: 0.757 ns
Signed 64-bit modulus by 100 - Pass - average iteration
duration: 6.519 ns
Signed 64-bit division by 153 - Pass - average iteration
duration: 0.640 ns
Signed 64-bit modulus by 153 - Pass - average iteration
duration: 6.577 ns
Signed 64-bit division by 1,461 - Pass - average iteration
duration: 0.698 ns
Signed 64-bit modulus by 1,461 - Pass - average iteration
duration: 6.694 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration
duration: 0.698 ns
Signed 64-bit modulus by 10,000 (Currency) - Pass - average iteration
duration: 6.694 ns
Signed 64-bit division by 86,400,000 - Pass - average iteration
duration: 0.698 ns
Signed 64-bit modulus by 86,400,000 - Pass - average iteration
duration: 6.577 ns
ok
- Sum of average durations: 93.540 ns
- Overall average duration: 1.336 ns
--
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
More information about the fpc-devel
mailing list