[fpc-devel] Parallel Computing

Daniël Mantione daniel.mantione at freepascal.org
Mon Nov 3 16:01:42 CET 2008



Op Mon, 3 Nov 2008, schreef Florian Klaempfl:

> Michael Schnell schrieb:
>> IMHO any technology that enables FPC to compile a loop like (using
>> Oxygen syntax):
>>
>> for parallel i := 0 to 10 do begin
>>  a[i] := a[i] + b[i];
>> end;
>>
>> in a way that it on a multicore processor runs as fast as the
>> appropriate GNU C construct:
>>
>> #pragma ocm_parallel for
>> for (i=0; i<=10; i++) {
>>  a[i] = a[i] + b[i];
>> };
>>
>> would be fine.
>
> Great and you really believe this accelerates a program? Starting a
> thread takes a lot of time and such loops are usually memory throughput
> bound.
>
> Nice toy example without any real use.

While I largely agree with you, it's nice to show some numbers.

Let's do a comparison using Stream en an Opteron 2354 system. This is 
on 1 core:

  ----------------------------------------------------
Function     Rate (MB/s)  Avg time   Min time  Max time
Copy:       5881.0033      0.0218      0.0218      0.0218
Scale:      5873.7327      0.0218      0.0218      0.0219
Add:        5594.5421      0.0343      0.0343      0.0343
Triad:      5521.5076      0.0348      0.0348      0.0348
  ----------------------------------------------------

This is Stream on 8 cores:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:       15748.1714       0.0073       0.0081       0.0081
Scale:      15767.5971       0.0073       0.0081       0.0081
Add:        15498.8812       0.0112       0.0124       0.0124
Triad:      15557.8681       0.0111       0.0123       0.0124
-------------------------------------------------------------

So we get a 3-fold speedup for an 8-fold increase in processing capacity. 
The reason is memory bandwidth is the bottleneck. On Intel the situation 
is much worse. This is a Xeon 5420 on Blackford. 1 core:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        3923.3765       0.0294       0.0326       0.0328
Scale:       3928.0555       0.0294       0.0326       0.0327
Add:         3929.8382       0.0441       0.0489       0.0490
Triad:       3944.2933       0.0440       0.0487       0.0489
-------------------------------------------------------------

8 cores:

-------------------------------------------------------------
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        6117.1414       0.0189       0.0209       0.0210
Scale:       6116.7929       0.0189       0.0209       0.0210
Add:         6081.3180       0.0285       0.0316       0.0317
Triad:       6124.9809       0.0283       0.0313       0.0315
-------------------------------------------------------------

Daniël


More information about the fpc-devel mailing list