[fpc-devel] Parallel Computing
Peter Popov
ppopov at tamu.edu
Mon Nov 3 19:33:11 CET 2008
Hi all
I would like to point out that these days there is a huge hype around
multicore systems. As a result one sees stupid parallel demonstrations,
such as the Mandelbrot one. This is a purely graphics demo with no other
utility and remote from any reallistic parallel applications. Moreover,
this examples is one of the few which can be done efficiently on present
day multicore system. So, if FPC team is really intent on bringing
parallel constructs in the language, one has to look at things from a
broader perspective. It is necessary to think of what reallistic parallel
applications would look like.
Unfortunately, the utility of multicore systems has been largely
exagerated by their manufacturers. The main problem is that multiple cores
share the same memory bandwith. As a result it is highly unlikely that one
can have COMPLEX programs running in a concurrent way in a multicore
system without clogging the memory bus and using up all the cache.
Multiple cores are usefull if there is little memory transfer (does not
happen often, except of course if you compute fractals), or if memory
transfer is done in a predictable fashion. About the only example of the
later is linear algebra subroutines (scientific computing) and certain
multimedia applications (concurrent MPEG decoders, for example).
Now, it is true that a dot-product or vectur sum can be done very
ellegantly with a prallel loop. However, these are very low-level
operations, ones that (at least in the scientific computing community) are
typically optimized for each particular architecture and provided as user
api. Consequently no one would go around write matrix-vector
multiplication in a high level language. Linear algerba is the usual
bottleneck and if you do real applications this has already been written
and optimized. Consequently, parallel loops look beautiful, but they are
of little practially utility. In summary the programming style that lead
to assembly level loop-unrolling for superscalar processors is likely to
be the same programming style that will be used for multicore machines.
So typicall parallel code revolves around higher level algorithms. For
example, if you want to compute the heat distribution in a automobile
engine you would go and first partition your engine in a lot of smaller
components. The you would perform complex, memory intensive computations
on each piece, then you would patch them together. It is quesionable if
milticore systems are usefull in such a scenario as it involves large
memory transfers. However, if it is (or you have a real multi-processor
shared memory machine), then what you would need from the language is a
nice enapsulation of threads. This implies (local) parallel procedures,
arrays of (local) parallel procedures, parallel class methods, semaphores
and critical sections.
The present obstacle with object pascal is that for each (class) method
that implements a parallel algorithm, one has to separately implement a
thread object. Moreover, parallel algorithms typically need global vars
(the one they operate in prallel to), so you need to move the local method
variables to the thread object to. In the end, the implementation of you
algorithm is shared between the method and the thread object. Finally
synchronization is provided by classes (TEvent, TCriticalSection) which
have to be constructed and destructed explicitly, with the necessary
resource protection (try..finally) overhead. This is not convenient.
I hope this will helps the discussion.
Peter Popov
More information about the fpc-devel
mailing list