[fpc-devel] Improving i8086 performance..

Sat Dec 28 00:15:41 CET 2013

On 12/23/2013 03:34 PM, Max Nazhalov wrote:
> Hello, Everybody!
>
> Can anyone having the real i8086 hardware check attached MUL-helpers?
> I've tested them on a modern Intel CPU -- "mul_dword" is about 4.5..5
> times faster comparing to the generic FPC implementation, and
> "mul_qword" is about 18..20, but these numbers surely should be quite
> different for the real i8086 due to the progress in the today's ALU design.
> Just curious how "fast" they can be in the silicon of those days..
I made a small benchmark:

http://debian.fmi.uni-sofia.bg/~nickysn/fpc-8086/mul-benchmark/

The generic pascal version is compiled with -O2 for 8086/8088 (so, it 
doesn't use any 186+ instructions).

I ran it on my HP 200LX, which has a 7.91 MHz Intel "Hornet" 80186 CPU. 
Here are the results:

32-bit multiplication, N=10 (40960 multiplications)

mul32pas: 1902 ticks
mul32asm: 71 ticks

Or, in other words, 26.8 times faster. Not bad at all! :)

64-bit multiplication, N=10 (20480 multiplications)

mul64pas: 2704 ticks
mul64asm: 77 ticks

So the 64-bit multiplication got 35.1 times faster!

However, due to holidays, I'm not at home, where I have an IBM 5150 with 
an 8088 and a NEC V20 CPU (both CPUs are pin compatible, but the NEC is 
slightly faster at the same clockspeed, especially its mul and div 
instructions are faster, so it'd be interesting to test both). In about 
2 weeks I'll be back home and I'll be able to test it on these CPUs. But 
it'd be nice if Jim could run the test on his 8088 machine, because 
we'll know the results sooner and because it'll save me from having to 
swap processors on the 5150, since it currently has the NEC installed.

Also, where I currently am, I have access to a 286 machine with a broken 
PSU, so in the next few days, I'm planning to replace the PSU and run 
the test on a 286 as well :)

As for correctness, I ran the fpc testsuite with no regressions and 
reviewed the asm code (and checked the math). It looks correct, but I 
still haven't reviewed the overflow checking part of the 64-bit 
multiplication routine. I'll commit the patch when I finish that.

Anyway, thanks for the patch, it looks great so far and the speed up 
seems to be even greater on old processors, compared to the modern ones.

Nikolay