[fpc-pascal] Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Wed Oct 13 16:19:09 CEST 2010

On Wed, Oct 13, 2010 at 8:27 AM, Jonas Maebe <jonas.maebe at elis.ugent.be> wrote:
>
> 1) on entry of the "critical section" protected by this variable, you can
> have problems, because this sequence:
>
> locked:=true;
> local:=shared_global_var;
>
> may actually be executed in this order:
>
> local:=shared_global_var;
> locked:=true;

Thanks btw... Yes, I didn't know that fact until you posted a link in
another thread.  I had a curios problem with pointers and a link list
that a manager thread managed but another server thread had access to
read/write.  Once in a while the order of ops would change and cause
read access violations.

> So you can get speculative reads into the "critical section"
>
> 2) when exiting the "critical section", there are no problems, because none
> of the loads or stores before the one that sets the boolean "lock" variable
> to false, can be moved past that store.

Into meaning inside or outside the section?  I was under the
assumption that inside the section - ops were thread safe from reads.
But multi-core systems - I'd bet that order can be executed
differently.

> In summary, the fact that a particular program runs fine on your particular
> machine does not mean anything:
> a) your particular machine may not perform any kind of reordering that
> results in problems
> b) your particular program may not expose any kind of reordering that
> results in problems

After reading the wikipedia article, and AMD's engineer's blog
postings with suggested code, and considering I'm exclusively using
AMD cpus, I would say this is true.  Problems could certainly prove
difficult to resolve in cases involving for worker objects waiting for
other workers to solve (recursion or so) acting as a logic gate -
potentially a serious issue.

> That does not mean that automatically the program "can be used without
> memory barriers". It is virtually impossible to prove correctness of
> multi-threaded code running on multi-cores through testing, and it is
> literally impossible to prove it for all possible machines by testing on a
> single machine (even if that machine has 4096 cores and runs 16000 threads),
> simply because other machines may use different memory consistency models.

After reading up I would say that only under certain circumstances
memory barriers can be avoided by engineering via thread isolation
(see my commands in uThreads.pas) and limited access (indexed boolean
arrays in uThreads.pas with no order necessary due to polling); and a
good understanding of challenges is required when coding
multi-threaded apps for multi-core systems.  Sometimes memory barriers
aren't even needed or germane to a particular aspect of an application
feature or functionality.   Knowing which aspects go with what method
is what makes for stable and fast applications.  Lastly, because the
polling concept was already established, I would say that order of
execution with regard to the architecture set forth in my test case,
proves just that.  Polling for all true or false, does not require
concern for pre-emptive positives, or false positives.  As designed it
proves true IFF all threads are complete.  And these facts will remain
true on all cpus.

Thanks for the info, help, and discussion.