[fpc-devel] ARM/AARCH64 work

J. Gareth Moreton gareth at moreton-family.com
Mon Apr 26 08:09:52 CEST 2021


HI everyone,

So a quick update on my current work in progress on ARM/AArch64.  First the annoying news, besides 
the broken laptop... I've mislaid my ARM (32-bit) MicroSD card for the Raspberry Pi, so I can't test 
on that platform for the moment until I find it again.  Hopefully I can find it, otherwise I'll have 
to buy a new one and wait for my laptop to return so I can flash the 32-bit Raspberry Pi OS onto it.

In terms of actual development, I've been pursuing a couple of things so far.  One is some improved 
peephole optimisations to ldr and str statements, and the other is implementing "magic division" 
where division by a constant is replaced with a multiplication.  The ldr/str optimisations have 
stalled for the moment because of the heap corruption bug that occurs on the trunk, and my 
optimisations seem to expose it a bit more, while my magic-div changes are almost there, but I'm 
having problems with very large numbers.  In actuality, none of the dedicated division tests picked 
it up, but I got some mysterious failures elsewhere, and I eventually found a reproducible case in a 
benchmark test I'm writing.  This also shows the speed improvements when built under -O2:

Trunk:

Division compilation and timing test (using constants from System and Sysutils)
-------------------------------------------------------------------------------
              Unsigned 32-bit division by 2 - Pass - average iteration duration: 2.095 ns
              Unsigned 32-bit division by 3 - Pass - average iteration duration: 4.191 ns
             Unsigned 32-bit division by 10 - Pass - average iteration duration: 3.958 ns
            Unsigned 32-bit division by 100 - Pass - average iteration duration: 3.492 ns
              Unsigned 64-bit division by 2 - Pass - average iteration duration: 2.095 ns
              Unsigned 64-bit division by 3 - Pass - average iteration duration: 4.191 ns
              Unsigned 64-bit division by 5 - Pass - average iteration duration: 3.958 ns
             Unsigned 64-bit division by 10 - Pass - average iteration duration: 4.191 ns
  Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration duration: 6.519 ns
               Signed 64-bit division by 10 - Pass - average iteration duration: 4.191 ns
               Signed 64-bit division by 18 - Pass - average iteration duration: 3.958 ns
               Signed 64-bit division by 24 - Pass - average iteration duration: 3.725 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration duration: 6.985 ns
       Signed 64-bit division by 86,400,000 - Pass - average iteration duration: 5.821 ns

ok
- Sum of average durations: 59.372 ns
- Overall average duration: 4.241 ns

magic-div:

Division compilation and timing test (using constants from System and Sysutils)
-------------------------------------------------------------------------------
              Unsigned 32-bit division by 2 - Pass - average iteration duration: 1.630 ns
              Unsigned 32-bit division by 3 - Pass - average iteration duration: 2.328 ns
             Unsigned 32-bit division by 10 - Pass - average iteration duration: 2.328 ns
            Unsigned 32-bit division by 100 - Pass - average iteration duration: 2.328 ns
              Unsigned 64-bit division by 2 - Pass - average iteration duration: 1.630 ns
              Unsigned 64-bit division by 3 - Pass - average iteration duration: 3.027 ns
              Unsigned 64-bit division by 5 - Pass - average iteration duration: 3.027 ns
             Unsigned 64-bit division by 10 - Pass - average iteration duration: 3.027 ns
  Unsigned 64-bit division by 1,000,000,000 - FAIL - 18446744073709551615 div 1000000000; expected 
18446744073 got 1266874893
               Signed 64-bit division by 10 - Pass - average iteration duration: 3.027 ns
               Signed 64-bit division by 18 - Pass - average iteration duration: 3.027 ns
               Signed 64-bit division by 24 - Pass - average iteration duration: 3.027 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration duration: 3.027 ns
       Signed 64-bit division by 86,400,000 - Pass - average iteration duration: 3.027 ns

I figure once I fix that failure, I can submit a patch.  I'll submit the bench test too because it 
will be good for speed comparisons and can act as a test case itself.

Gareth aka. Kit


More information about the fpc-devel mailing list