[fpc-devel] Parallel Computing
Daniël Mantione
daniel.mantione at freepascal.org
Mon Nov 3 16:01:42 CET 2008
Op Mon, 3 Nov 2008, schreef Florian Klaempfl:
> Michael Schnell schrieb:
>> IMHO any technology that enables FPC to compile a loop like (using
>> Oxygen syntax):
>>
>> for parallel i := 0 to 10 do begin
>> a[i] := a[i] + b[i];
>> end;
>>
>> in a way that it on a multicore processor runs as fast as the
>> appropriate GNU C construct:
>>
>> #pragma ocm_parallel for
>> for (i=0; i<=10; i++) {
>> a[i] = a[i] + b[i];
>> };
>>
>> would be fine.
>
> Great and you really believe this accelerates a program? Starting a
> thread takes a lot of time and such loops are usually memory throughput
> bound.
>
> Nice toy example without any real use.
While I largely agree with you, it's nice to show some numbers.
Let's do a comparison using Stream en an Opteron 2354 system. This is
on 1 core:
----------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 5881.0033 0.0218 0.0218 0.0218
Scale: 5873.7327 0.0218 0.0218 0.0219
Add: 5594.5421 0.0343 0.0343 0.0343
Triad: 5521.5076 0.0348 0.0348 0.0348
----------------------------------------------------
This is Stream on 8 cores:
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 15748.1714 0.0073 0.0081 0.0081
Scale: 15767.5971 0.0073 0.0081 0.0081
Add: 15498.8812 0.0112 0.0124 0.0124
Triad: 15557.8681 0.0111 0.0123 0.0124
-------------------------------------------------------------
So we get a 3-fold speedup for an 8-fold increase in processing capacity.
The reason is memory bandwidth is the bottleneck. On Intel the situation
is much worse. This is a Xeon 5420 on Blackford. 1 core:
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 3923.3765 0.0294 0.0326 0.0328
Scale: 3928.0555 0.0294 0.0326 0.0327
Add: 3929.8382 0.0441 0.0489 0.0490
Triad: 3944.2933 0.0440 0.0487 0.0489
-------------------------------------------------------------
8 cores:
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 6117.1414 0.0189 0.0209 0.0210
Scale: 6116.7929 0.0189 0.0209 0.0210
Add: 6081.3180 0.0285 0.0316 0.0317
Triad: 6124.9809 0.0283 0.0313 0.0315
-------------------------------------------------------------
Daniël
More information about the fpc-devel
mailing list