[fpc-pascal] Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64

Wed Oct 13 15:27:12 CEST 2010

On 13 Oct 2010, at 00:51, Andrew Brunner wrote:

> The interesting thing I have noticed was that Arrays[n] of boolean can
> be used without memory barriers.  There is not one lock associated
> with the boolean arrays and it always proves non-problematic on a 6
> core system with 4gig ram.  There are boolean value checks that I did
> inside the loops to see if any values were assigned out-of-order and
> over the hours of tests I ran across up to 1200 threads... not one
> false positive!

See also http://en.wikipedia.org/wiki/Memory_ordering#cite_note- 
table-2 for an overview of what kind of memory reordering is performed  
by different architectures . It shows that x86 CPUs only perform one  
kind of memory reordering (except if it supports and is explicitly put  
into oostore mode). The reordering it supports by default can execute  
stores that come before a load in the program code, after that load  
instead. This means that if you use a regular variable (such as a  
boolean) for synchronisation

1) on entry of the "critical section" protected by this variable, you  
can have problems, because this sequence:

locked:=true;
local:=shared_global_var;

may actually be executed in this order:

local:=shared_global_var;
locked:=true;

So you can get speculative reads into the "critical section"

2) when exiting the "critical section", there are no problems, because  
none of the loads or stores before the one that sets the boolean  
"lock" variable to false, can be moved past that store.

In summary, the fact that a particular program runs fine on your  
particular machine does not mean anything:
a) your particular machine may not perform any kind of reordering that  
results in problems
b) your particular program may not expose any kind of reordering that  
results in problems

That does not mean that automatically the program "can be used without  
memory barriers". It is virtually impossible to prove correctness of  
multi-threaded code running on multi-cores through testing, and it is  
literally impossible to prove it for all possible machines by testing  
on a single machine (even if that machine has 4096 cores and runs  
16000 threads), simply because other machines may use different memory  
consistency models.

Jonas