[fpc-devel] ARM/AARCH64 work
J. Gareth Moreton
gareth at moreton-family.com
Mon Apr 26 08:09:52 CEST 2021
HI everyone,
So a quick update on my current work in progress on ARM/AArch64. First the annoying news, besides
the broken laptop... I've mislaid my ARM (32-bit) MicroSD card for the Raspberry Pi, so I can't test
on that platform for the moment until I find it again. Hopefully I can find it, otherwise I'll have
to buy a new one and wait for my laptop to return so I can flash the 32-bit Raspberry Pi OS onto it.
In terms of actual development, I've been pursuing a couple of things so far. One is some improved
peephole optimisations to ldr and str statements, and the other is implementing "magic division"
where division by a constant is replaced with a multiplication. The ldr/str optimisations have
stalled for the moment because of the heap corruption bug that occurs on the trunk, and my
optimisations seem to expose it a bit more, while my magic-div changes are almost there, but I'm
having problems with very large numbers. In actuality, none of the dedicated division tests picked
it up, but I got some mysterious failures elsewhere, and I eventually found a reproducible case in a
benchmark test I'm writing. This also shows the speed improvements when built under -O2:
Trunk:
Division compilation and timing test (using constants from System and Sysutils)
-------------------------------------------------------------------------------
Unsigned 32-bit division by 2 - Pass - average iteration duration: 2.095 ns
Unsigned 32-bit division by 3 - Pass - average iteration duration: 4.191 ns
Unsigned 32-bit division by 10 - Pass - average iteration duration: 3.958 ns
Unsigned 32-bit division by 100 - Pass - average iteration duration: 3.492 ns
Unsigned 64-bit division by 2 - Pass - average iteration duration: 2.095 ns
Unsigned 64-bit division by 3 - Pass - average iteration duration: 4.191 ns
Unsigned 64-bit division by 5 - Pass - average iteration duration: 3.958 ns
Unsigned 64-bit division by 10 - Pass - average iteration duration: 4.191 ns
Unsigned 64-bit division by 1,000,000,000 - Pass - average iteration duration: 6.519 ns
Signed 64-bit division by 10 - Pass - average iteration duration: 4.191 ns
Signed 64-bit division by 18 - Pass - average iteration duration: 3.958 ns
Signed 64-bit division by 24 - Pass - average iteration duration: 3.725 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration duration: 6.985 ns
Signed 64-bit division by 86,400,000 - Pass - average iteration duration: 5.821 ns
ok
- Sum of average durations: 59.372 ns
- Overall average duration: 4.241 ns
magic-div:
Division compilation and timing test (using constants from System and Sysutils)
-------------------------------------------------------------------------------
Unsigned 32-bit division by 2 - Pass - average iteration duration: 1.630 ns
Unsigned 32-bit division by 3 - Pass - average iteration duration: 2.328 ns
Unsigned 32-bit division by 10 - Pass - average iteration duration: 2.328 ns
Unsigned 32-bit division by 100 - Pass - average iteration duration: 2.328 ns
Unsigned 64-bit division by 2 - Pass - average iteration duration: 1.630 ns
Unsigned 64-bit division by 3 - Pass - average iteration duration: 3.027 ns
Unsigned 64-bit division by 5 - Pass - average iteration duration: 3.027 ns
Unsigned 64-bit division by 10 - Pass - average iteration duration: 3.027 ns
Unsigned 64-bit division by 1,000,000,000 - FAIL - 18446744073709551615 div 1000000000; expected
18446744073 got 1266874893
Signed 64-bit division by 10 - Pass - average iteration duration: 3.027 ns
Signed 64-bit division by 18 - Pass - average iteration duration: 3.027 ns
Signed 64-bit division by 24 - Pass - average iteration duration: 3.027 ns
Signed 64-bit division by 10,000 (Currency) - Pass - average iteration duration: 3.027 ns
Signed 64-bit division by 86,400,000 - Pass - average iteration duration: 3.027 ns
I figure once I fix that failure, I can submit a patch. I'll submit the bench test too because it
will be good for speed comparisons and can act as a test case itself.
Gareth aka. Kit
More information about the fpc-devel
mailing list