[fpc-pascal]Execution speed
Marco van de Voort
Marcov at stack.nl
Sun Jan 26 12:53:25 CET 2003
> Now that the list seems to be working again...I'll try posting this again:
>
> I am wondering why our new PC is not executing our fpc-compiled program
> very much faster than the old one. It was really quite a disappointment:
>
> Old PC: Laptop, Intel PII, 300 MHz, 64 MB. Execution times: 8:30, 2:30 (min:sec)
>
> New PC: Desktop, AMD Duron, 1.6 GHz, 128 MB. Execution times: 5:15, 1:15
I'm not the processor-crack of the FPC team, but I'll give it a shot.
(Jonas and Florian will probably correct/comment on this heavily :-)
I'm afraid you have fell for the MHz'itis, iow that the throughput
speed of a processor is purely dependant on the speed of the CPU
(in MHz):
Some important things I noticed immediately from your msg:
- there is still a nearly two fold increase. (less for the first, exactly twofold for the second)
- you use 4 MB memory, and I assume from the story that is rather
random access
- The Duron has less cache than an Athlon, and the Duron's is probably about
the same magnitude as the P-II
- the 4 MB doesn't fit in the cache -> processor is waiting for memory all the time.
> The new PC ought to be 5 times faster (1600 MHz / 300 MHz, right?
Depends on the job. The memory interface is probably only two
times faster (66 MHz <-> 133 MHz) or so, and the cache (that can
in some cases "hide" the slower memory), is also hardly larger.
>Of course the speed of the memory is also a factor) but it's not even twice
> as fast.
Which is indeed the reason that it is memory bound. (together with
the problem being not OS dependant, I assume you tried some *nix)
I went from a K6-2 500 to an Athlon 1666 (XP2000+), which is
about a fat 3 step, but the compiler compiles itself more than 3 times
as fast.
> The execution time pairs are determined from three time stamps that
> occur during one run of the program. The sequence is as follows:
>
> * Stamp 1
> -Initialize (5-10 secs reading/processing from HD)
> -Process 1 (5-9 mins)
> * Stamp 2
> -Process 2 (1-3 mins)
> * Stamp 3
Since the second process scales better, I assume it approaches
memory in a way that can be better
> Both machines are running Win98 Second Edition (could Windows 98 be
> preventing the faster machine from running at full capacity?
Not for pure calculation I think. Maybe for heavily IO-bound or
threading programs 98 makes a huge difference, but if there is a
difference in calculation speed in 98, it won't be more than a few
percent (and since NT and unix have more to do in the background,
this could even be positive)
> Or perhaps it's because fpc runs in a DOS window, and the DOS mode is forcing it to
> run slow?)
>
> The program is very processor intensive. Only about 4MB of memory space
> is used.
You could try to change the memory usage in a way that
subsequent memory access will be adjacent in memory, and play
with alignments.
You could also try to find/borrow a processor with a large cache
(e.g. a P-III Xeon with 2 MB cache would be ideal, but an Athlon MP
or even a simple Athlon would be interesting), and do the test on
such a machine.
> During runtime, we are doing less than 400 kb of read/write combined to
> the HD. We put about 10 lines of text on the DOS screen to show
> progress. So I can't imagine the I/O could be slowing us down.
Not likely no.
> I tried compiling with the two different target platforms, but it didn't
> make a difference. Stackchecking is on, but it was on on both computers.
Did you use the same amounts of optimization? Maybe you
have -OG3p3r or so in the ppc386.cfg on the P-II (which
automatically adds the heaviest optimizations), and not on the Duron.
> I also tried a few different bios settings (the computer has ready-made
> bios configurations for "Optimal" and "Best Performance" (?) as well as
> the factory default I started with.) But the compile times were the same
> regardless of the bios settings.
Usually this is a few percent max, not magnitudes.
Action list: (in order that I would do them, from first to last resort)
1 verify that you use the same degree of optimizations.
2 Try on a machine with more cache.
3 Try to rewrite programs to do more accesses to the same block of
memory.
More information about the fpc-pascal
mailing list