[fpc-devel] Parallel Computing
Daniël Mantione
daniel.mantione at freepascal.org
Mon Nov 3 16:31:59 CET 2008
Op Mon, 3 Nov 2008, schreef Florian Klaempfl:
> Well, those tests even don't take care of thread starting time :)
Threads are started at application startup, in fact my command lines were:
[cvsupport at node001 ~]$ OMP_NUM_THREADS=1 ./stream_omp
[cvsupport at node001 ~]$ OMP_NUM_THREADS=8 ./stream_omp
Theads not needed are simply blocked until an OpenMP loop activates them.
> Taking advantage of MT requires always deep knowledge about the used
> architecture and the code being executed and this is something OpenMP
> ignores. For a big vector operation the number of used threads should be
> adapted to the memory architecture
... and bound to the correct cores, i.e.:
[cvsupport at node001 stream]$ OMP_NUM_THREADS=2 numactl --physcpubind=0,4
./stream_omp
... gives:
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 8294.5171 0.0139 0.0154 0.0155
Scale: 8191.0001 0.0141 0.0156 0.0157
Add: 7920.1633 0.0218 0.0242 0.0244
Triad: 7990.9738 0.0217 0.0240 0.0241
-------------------------------------------------------------
But:
[cvsupport at node001 stream]$ OMP_NUM_THREADS=2 numactl --physcpubind=0,4
./stream_omp
... gives:
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 11603.7546 0.0099 0.0110 0.0110
Scale: 11152.7465 0.0104 0.0115 0.0116
Add: 10795.5704 0.0160 0.0178 0.0179
Triad: 10881.7832 0.0159 0.0176 0.0177
-------------------------------------------------------------
So, you need knowledge about the underlying NUMA architecture to get the
best performance.
> for computational intensive applications like Mandelbrot the number of
> threads must be adapted to the number of available virtual cores.
Exactly.
By the way, GCC is totally unsuitable for this benchmark, both its
OpenMP implementation as it's loop vectorizers are too weak. You need
Intel or Pathscale to reproduce these results.
Daniël
More information about the fpc-devel
mailing list