[fpc-devel] Improving i8086 performance..

Thu Jan 2 23:52:41 CET 2014

On 01/01/2014 05:35 PM, Max Nazhalov wrote:
>> Date: Tue, 31 Dec 2013 19:42:44 +0200
>> From: Nikolay Nikolov <nickysn at gmail.com>
>>
>> I got my PSU fixed and now I have results from my 10 MHz PS/2 Model 30 286:
>> 32pas: ticks = 814
>> 32asm: ticks = 30
>> ~27x faster
>>
>> 64pas: ticks = 1130
>> 64asm: ticks = 30
>> ~38x faster
> Thanks for follow-up, Nikolay!
> I'm still looking for performance bottlenecks of the current float<->ascii conversion. Can You run the attached benchmark to make it clear how the uint32|uint64 multiplication affects this?

Ok, I compiled it with the latest fpc trunk, with -O2 -WmMedium. Here 
are the binaries:

http://debian.fmi.uni-sofia.bg/~nickysn/fpc-8086/testf80/

Here are the results from my 286:

testf80p.exe:

Seed: 0x004A0020
Predefs: 0

Input count and press ENTER (blank to use the default=1000000) >1000
Count: 1000

[normal] null:      7.799 s
[normal] f2a:     129.510 s (t-null)
[normal] a2f:      89.703 s (t-f2a-null)
[subnormal] null:   4.771 s
[subnormal] f2a:  163.248 s (t-null)
[subnormal] a2f:   96.229 s (t-f2a-null)

Next seed: 0xFFB5FFDF

Note:
   Small count of numers will lead to arbitrarily imprecise/random timings.

testf80a.exe:

Seed: 0x2601002E
Predefs: 0

Input count and press ENTER (blank to use the default=1000000) >1000
Count: 1000

[normal] null:      4.500 s
[normal] f2a:      22.580 s (t-null)
[normal] a2f:      17.749 s (t-f2a-null)
[subnormal] null:   4.071 s
[subnormal] f2a:   27.558 s (t-null)
[subnormal] a2f:   18.730 s (t-f2a-null)

Next seed: 0xD9FEFFD1

Note:
   Small count of numers will lead to arbitrarily imprecise/random timings.

Nikolay