[fpc-pascal] Multi-threaded project with few locks (no Thread.waitfor). Memory consumption keeps increasing on Ubuntu 10.10 x64
Jonas Maebe
jonas.maebe at elis.ugent.be
Wed Oct 13 15:27:12 CEST 2010
On 13 Oct 2010, at 00:51, Andrew Brunner wrote:
> The interesting thing I have noticed was that Arrays[n] of boolean can
> be used without memory barriers. There is not one lock associated
> with the boolean arrays and it always proves non-problematic on a 6
> core system with 4gig ram. There are boolean value checks that I did
> inside the loops to see if any values were assigned out-of-order and
> over the hours of tests I ran across up to 1200 threads... not one
> false positive!
See also http://en.wikipedia.org/wiki/Memory_ordering#cite_note-
table-2 for an overview of what kind of memory reordering is performed
by different architectures . It shows that x86 CPUs only perform one
kind of memory reordering (except if it supports and is explicitly put
into oostore mode). The reordering it supports by default can execute
stores that come before a load in the program code, after that load
instead. This means that if you use a regular variable (such as a
boolean) for synchronisation
1) on entry of the "critical section" protected by this variable, you
can have problems, because this sequence:
locked:=true;
local:=shared_global_var;
may actually be executed in this order:
local:=shared_global_var;
locked:=true;
So you can get speculative reads into the "critical section"
2) when exiting the "critical section", there are no problems, because
none of the loads or stores before the one that sets the boolean
"lock" variable to false, can be moved past that store.
In summary, the fact that a particular program runs fine on your
particular machine does not mean anything:
a) your particular machine may not perform any kind of reordering that
results in problems
b) your particular program may not expose any kind of reordering that
results in problems
That does not mean that automatically the program "can be used without
memory barriers". It is virtually impossible to prove correctness of
multi-threaded code running on multi-cores through testing, and it is
literally impossible to prove it for all possible machines by testing
on a single machine (even if that machine has 4096 cores and runs
16000 threads), simply because other machines may use different memory
consistency models.
Jonas
More information about the fpc-pascal
mailing list