[fpc-pascal]Execution speed

Marco van de Voort Marcov at stack.nl
Sun Jan 26 12:53:25 CET 2003


> Now that the list seems to be working again...I'll try posting this again:
> 
> I am wondering why our new PC is not executing our fpc-compiled program
> very much faster than the old one. It was really quite a disappointment:
> 
> Old PC: Laptop, Intel PII, 300 MHz, 64 MB. Execution times: 8:30, 2:30 (min:sec)
> 
> New PC: Desktop, AMD Duron, 1.6 GHz, 128 MB. Execution times: 5:15, 1:15

I'm not the processor-crack of the FPC team, but I'll give it a shot. 
(Jonas and Florian will probably correct/comment on this heavily :-)

I'm afraid you have fell for the MHz'itis, iow that the throughput 
speed of a processor is purely dependant on the speed of the CPU 
(in MHz):

Some important things I noticed immediately from your msg:

- there is still a nearly two fold increase. (less for the first, exactly twofold for the second)
- you use 4 MB memory, and I assume from the story that is rather 
random access
- The Duron has less cache than an Athlon, and the Duron's is probably about 
   the same  magnitude as the P-II
- the 4 MB doesn't fit in the cache -> processor is waiting for memory all the time.

> The new PC ought to be 5 times faster (1600 MHz / 300 MHz, right? 

Depends on the job. The memory interface is probably only two 
times faster (66 MHz <-> 133 MHz) or so, and the cache (that can 
in some cases "hide" the slower memory), is also hardly larger.

>Of course the speed of the memory is also a factor) but it's not even twice
> as fast.

Which is indeed the reason that it is memory bound. (together with 
the problem being not OS dependant, I assume you tried some *nix)

I went from a K6-2 500 to an Athlon 1666 (XP2000+), which is 
about a fat 3 step, but the compiler compiles itself more than 3 times 
as fast.

> The execution time pairs are determined from three time stamps that
> occur during one run of the program. The sequence is as follows:
> 
> * Stamp 1
> -Initialize (5-10 secs reading/processing from HD)
> -Process 1 (5-9 mins)
> * Stamp 2
> -Process 2 (1-3 mins)
> * Stamp 3

Since the second process scales better, I assume it approaches 
memory in a way that can be better

> Both machines are running Win98 Second Edition (could Windows 98 be
> preventing the faster machine from running at full capacity?

Not for pure calculation I think. Maybe for heavily IO-bound or 
threading programs 98 makes a huge difference, but if there is a 
difference in calculation speed in 98, it won't be more than a few 
percent (and since NT and unix have more to do in the background, 
this could even be positive)

> Or perhaps it's because fpc runs in a DOS window, and the DOS mode is forcing it to
> run slow?)
> 
> The program is very processor intensive. Only about 4MB of memory space
> is used.

You could try to change the memory usage in a way that 
subsequent memory access will be adjacent in memory, and play 
with alignments.

You could also try to find/borrow a processor with a large cache 
(e.g. a P-III Xeon with 2 MB cache would be ideal, but an Athlon MP 
or even a simple Athlon would be interesting), and do the test on 
such a machine.

> During runtime, we are doing less than 400 kb of read/write combined to
> the HD. We put about 10 lines of text on the DOS screen to show
> progress. So I can't imagine the I/O could be slowing us down.

Not likely no.

> I tried compiling with the two different target platforms, but it didn't
> make a difference. Stackchecking is on, but it was on on both computers.

Did you use the same amounts of optimization? Maybe you 
have -OG3p3r or so in the ppc386.cfg on the P-II (which 
automatically adds the heaviest optimizations), and not on the Duron.

> I also tried a few different bios settings (the computer has ready-made
> bios configurations for "Optimal" and "Best Performance" (?) as well as
> the factory default I started with.) But the compile times were the same
> regardless of the bios settings.

Usually this is a few percent max, not magnitudes.

Action list: (in order that I would do them, from first to last resort)
1 verify that you use the same degree of optimizations. 
2 Try on a machine with more cache.
3 Try to rewrite programs to do more accesses to the same block of 
memory.



More information about the fpc-pascal mailing list