[fpc-devel] Parallel Computing

Mon Nov 3 19:33:11 CET 2008

Hi all
I would like to point out that these days there is a huge hype around  
multicore systems. As a result one sees stupid parallel demonstrations,  
such as the Mandelbrot one. This is a purely graphics demo with no other  
utility and remote from any reallistic parallel applications. Moreover,  
this examples is one of the few which can be done efficiently on present  
day multicore system. So, if FPC team is really intent on bringing  
parallel constructs in the language, one has to look at things from a  
broader perspective. It is necessary to think of what reallistic parallel  
applications would look like.

Unfortunately, the utility of multicore systems has been largely  
exagerated by their manufacturers. The main problem is that multiple cores  
share the same memory bandwith. As a result it is highly unlikely that one  
can have COMPLEX programs running in a concurrent way in a multicore  
system without clogging the memory bus and using up all the cache.  
Multiple cores are usefull if there is little memory transfer (does not  
happen often, except of course if you compute fractals), or if memory  
transfer is done in a predictable fashion. About the only example of the  
later is linear algebra subroutines (scientific computing) and certain  
multimedia applications (concurrent MPEG decoders, for example).

Now, it is true that a dot-product or vectur sum can be done very  
ellegantly with a prallel loop. However, these are very low-level  
operations, ones that (at least in the scientific computing community) are  
typically optimized for each particular architecture and provided as user  
api. Consequently no one would go around write matrix-vector  
multiplication in a high level language. Linear algerba is the usual  
bottleneck and if you do real applications this has already been written  
and optimized. Consequently, parallel loops look beautiful, but they are  
of little practially utility. In summary the programming style that lead  
to assembly level loop-unrolling for superscalar processors is likely to  
be the same programming style that will be used for multicore machines.

So typicall parallel code revolves around higher level algorithms. For  
example, if you want to compute the heat distribution in a automobile  
engine you would go and first partition your engine in a lot of smaller  
components. The you would perform complex, memory intensive computations  
on each piece, then you would patch them together. It is quesionable if  
milticore systems are usefull in such a scenario as it involves large  
memory transfers. However, if it is (or you have a real multi-processor  
shared memory machine), then what you would need from the language is a  
nice enapsulation of threads. This implies (local) parallel procedures,  
arrays of (local) parallel procedures, parallel class methods, semaphores  
and critical sections.

The present obstacle with object pascal is that for each (class) method  
that implements a parallel algorithm, one has to separately implement a  
thread object. Moreover, parallel algorithms typically need global vars  
(the one they operate in prallel to), so you need to move the local method  
variables to the thread object to. In the end, the implementation of you  
algorithm is shared between the method and the thread object. Finally  
synchronization is provided by classes (TEvent, TCriticalSection) which  
have to be constructed and destructed explicitly, with the necessary  
resource protection (try..finally) overhead. This is not convenient.

I hope this will helps the discussion.

Peter Popov