[fpc-devel] Improving i8086 performance..
nickysn at gmail.com
Sat Dec 28 00:15:41 CET 2013
On 12/23/2013 03:34 PM, Max Nazhalov wrote:
> Hello, Everybody!
> Can anyone having the real i8086 hardware check attached MUL-helpers?
> I've tested them on a modern Intel CPU -- "mul_dword" is about 4.5..5
> times faster comparing to the generic FPC implementation, and
> "mul_qword" is about 18..20, but these numbers surely should be quite
> different for the real i8086 due to the progress in the today's ALU design.
> Just curious how "fast" they can be in the silicon of those days..
I made a small benchmark:
The generic pascal version is compiled with -O2 for 8086/8088 (so, it
doesn't use any 186+ instructions).
I ran it on my HP 200LX, which has a 7.91 MHz Intel "Hornet" 80186 CPU.
Here are the results:
32-bit multiplication, N=10 (40960 multiplications)
mul32pas: 1902 ticks
mul32asm: 71 ticks
Or, in other words, 26.8 times faster. Not bad at all! :)
64-bit multiplication, N=10 (20480 multiplications)
mul64pas: 2704 ticks
mul64asm: 77 ticks
So the 64-bit multiplication got 35.1 times faster!
However, due to holidays, I'm not at home, where I have an IBM 5150 with
an 8088 and a NEC V20 CPU (both CPUs are pin compatible, but the NEC is
slightly faster at the same clockspeed, especially its mul and div
instructions are faster, so it'd be interesting to test both). In about
2 weeks I'll be back home and I'll be able to test it on these CPUs. But
it'd be nice if Jim could run the test on his 8088 machine, because
we'll know the results sooner and because it'll save me from having to
swap processors on the 5150, since it currently has the NEC installed.
Also, where I currently am, I have access to a 286 machine with a broken
PSU, so in the next few days, I'm planning to replace the PSU and run
the test on a 286 as well :)
As for correctness, I ran the fpc testsuite with no regressions and
reviewed the asm code (and checked the math). It looks correct, but I
still haven't reviewed the overflow checking part of the 64-bit
multiplication routine. I'll commit the patch when I finish that.
Anyway, thanks for the patch, it looks great so far and the speed up
seems to be even greater on old processors, compared to the modern ones.
More information about the fpc-devel