[fpc-other] Re: [fpc-devel] volatile variables

Sat Jul 2 09:50:28 CEST 2011

On Thu, 30 Jun 2011 11:31:23 +0200, Hans-Peter Diettrich
<DrDiettrich1 at aol.com> wrote:

> Vinzent Höfler schrieb:

[pragma Volatile of Ada05]
> But what would this mean to FPC code in general (do we *need* such
> attributes?), and what will be their speed impact? This obviously
> depends on the effects of the specific synchronizing instructions,
> inserted by the compiler.

I think we should have something to the effect of an "Atomic". As I
already pointed out, "Volatile" does not help us here.

Static semantics should be that any variable declared as "atomic" must be
properly aligned by the compiler and - of course - must be guaranteed to
have atomic access (and by that I mean on processor level, not with higher
level locking primitives, so basically, the variable must not be larger
than the processor's word-size).

Dynamic semantics should be that the compiler inserts proper
synchronisation
constructs (i.e. read-write-memory-barrier) when accessing this variable.

This would include Volatile semantics, of course. Or, to quote from the
Ada RM again, the actions concerning atomic variables shall be sequential:

|C.6(17): Two actions are sequential (see 9.10) if each is the read or
|         update of the same atomic object.

where being "sequential" is defined as:

|9.10(12ff.): One action signals the other;
|             Both actions occur as part of the execution of the same task;
|             Both actions occur as part of protected actions on the same
|             protected object [...]
|9.10(15):    A pragma Atomic [...] ensure[s] that certain reads and
updates
|             are sequential [...].

I understand that FPC added memory barrier subroutines a while ago, but I
still don't like the idea of having to write all this memory-barrier stuff
myself. I mean, I have several years of experience with multi-threaded
programming, so I know about locking and I am not totally lost, but I must
admit, I still don't get all that multi-core stuff and the ever more
weakening memory models that come with it, at all times.
So, to expect such a low-level understanding from Mr. Average Joe Pascal-
programmer who just wants to make use of his octa-core, would be too much
to expect, I'd say.

I think, a write-barrier is sufficient when reading such a variable to
ensure
all pending writes are executed (reads can't have an effect on it, can
they?)
and when writing it, no special instructions are needed. Hmm. that can't be
right, I'm surely missing something here.

> Consider the shareable bi-linked list, where insertion requires code  
> like this:
>    list.Lock; //prevent concurrent access
>    ... //determine affected list elements
>    new.prev := prev; //prev must be guaranteed to be valid
>    new.next := next;
>    prev.next := new;
>    next.prev := new;
>    list.Unlock;
> What can we expect from the Lock method/instruction - what kind of  
> synchronizaton (memory barrier) can, will or should it provide?

Lock should employ a read-write-memory barrier so that all loads and
stores are done before the code continues executing.

> My understanding of a *full* cache synchronization would slow down not
> only the current core and cache, but also all other caches?

Fortunately that part is taken care of by the hardware and only done on
the affected cache-lines.

> If so, would it help to enclose above instructions in e.g.
>    Synchronized begin
>      update the links...
>    end;
> so that the compiler can make all memory references (at least reads)
> occur read/write-through, inside such a code block? Eventually a global
> cache sync can be inserted on exit from such a block.

That would be too much - and would hit real big on performance.

> Conclusion:
>
> We need a documentation of the FPC specific means of cache  
> synchronization, with their guaranteed effects on every target[1].

So, the memory-barrier subroutines and their implied semantics shall
be documented (if they aren't already). Preferably per target. Examples
when and why to use them would be good, too.

> above. When cache synchronization is a big issue, then the usage of  
> related (thread-unaware) objects should be discussed as well, i.e. how  
> to ensure that their use will cause no trouble, e.g. by invalidating the  
> entire cache before.

Fortunately it's not, it's only the memory read/write reordering that can
cause different views on the same data by the cores.

Vinzent.