[fpc-devel] Light weight threads for FPC

Fri Dec 14 15:09:22 CET 2007

On Fri, 14 Dec 2007 12:36:51 +0000
Mark Morgan Lloyd <markMLl.fpc-devel at telemetry.co.uk> wrote:

> Mattias Gaertner wrote:
> > On Fri, 14 Dec 2007 10:29:59 +0100
> > Michael Schnell <mschnell at lumino.de> wrote:
> > 
> >> Mattias Gaertner wrote:
> >>> Has someone already created a unit for light weight threads?
> >>>   
> >> What do you mean by "light weight threads" ? How can it get
> >> "lighter" than TThread, that offers close to no built-in
> >> "comfort"-functionality ?
> > 
> > Hehe. I mainly choose the term to provoke.
> > 
> > TThread is a class and has more functionality than needed, but
> > misses the ability to work in teams. For many parallel algorithms
> > you don't need events, priority or synchronize. But you need to
> > easily and fast start a set of threads with IDs 0..N. N can be
> > computed from the number of cores or IO queues. But many algorithms
> > simply compute N from the input size (e.g. N=n/log(n)). A pool with
> > threads, each running and eating tasks in loops are 'lighter' than
> > starting N real threads. For example starting a procedure "100
> > times in parallel" means on a dual core: Two threads execute the
> > procedure 50 times. This is much lighter than 100 TThreads. And if
> > one of the threads 'runs slower' than it will automatically load
> > balance.
> > 
> > So, I hope that someone already implemented some 'light weight'
> > threads for FPC and I can adapt and extend them.
> 
> On the other hand, particularly in the case where you are running a
> large number of iterations of a parallelised job it's useful to be
> able to time each thread so that the distribution of work can be
> optimised "on the fly".
> 
> As an example, looking at a big cellular automaton you might have a
> few hundred million cells and want to run for a billion or so
> generations. I think it's going to be significantly more efficient to
> have a set of N threads waiting for their worklists, on completion
> looking at the time each thread took and optimising the next
> generation than starting a pool of threads per generation.

This is true for static load balancing. 
But the thread pool would do dynamic load balancing. The jobs will not
be distributed at the beginning, but on the fly.
Each thread should run something like this:

while GetNextTask do ExecuteTask;

When starting a procedure 100 times, the pool will "add" 100 tasks and
wake up the sleeping threads. The threads will then process the tasks.
There is no guarantee that a task will run in a specific thread. If a
task fetches a small task, it will start earlier with the next. So if
you are able to split up the problem in enough small tasks, you get
dynamic load balancing for free.

> One thing that would be useful would be a processor-affinity property
> on the basic TThread.

See the OpenMP specs for more possible tuning parameters.
The pool can be extended by some of these abilities later, but for now
I only need a simple pool to demonstrate/test FPC with parallel
algorithms.

Mattias