[fpc-devel] volatile variables

Thu Jun 30 11:31:23 CEST 2011

Vinzent Höfler schrieb:

>>>> Question is, what makes one variable use read/write-through, while 
>>>> other variables can be read from the cache, with lazy-write?
>>>  Synchronisation. Memory barriers. That's what they are for.
>>
>> And this doesn't happen out of thin air. How else?
> 
> Ok, maybe I misunderstood your question. Normally, these days every
> memory access is cached (and that's of course independent from compiler
> optimizations).

This should be "normal" (speedy) behaviour.

> But there are usually possibilities to define certain areas as
> write-through (like the video frame buffer). But that's more a chip-set
> thing than it has anything to do with concurrent programming.

I also thought about cache or page attributes (CONST vs. DATA sections), 
but these IMO don't offer the fine granularity, that would allow to 
separate synchronized from unsynchronized variables (class/record 
members!) in code. Even the memory manager would have to deal with two 
classes of un/synced memory then :-(

> Apart from that you have to use the appropiate CPU instructions.

Seems to be the only solution.

>>>> Is this a compiler requirement, which has to enforce 
>>>> read/write-through for all volatile variables?
>>>  No.  "volatile" (at least in C) does not mean that.
>>
>> Can you provide a positive answer instead?
> 
> Ada2005 RM:
> 
> |C.6(16): For a volatile object all reads and updates of the object as
> |         a whole are performed directly to memory.
> |C.6(20): The external effect [...] is defined to include each read and
> |         update of a volatile or atomic object. The implementation shall
> |         not generate any memory reads or updates of atomic or volatile
> |         objects other than those specified by the program.
> 
> That's Ada's definition of "volatile". C's definition is less stronger, but
> should basically have the same effect.
> 
> Is that positive enough for you? ;)

Much better ;-)

But what would this mean to FPC code in general (do we *need* such 
attributes?), and what will be their speed impact? This obviously 
depends on the effects of the specific synchronizing instructions, 
inserted by the compiler.

>>>> But if so, which variables (class fields...) can ever be treated as 
>>>> non-volatile, when they can be used from threads other than the main 
>>>> thread?
>>>  Without explicit synchronisation? Actually, none.
>>
>> Do you understand the implication of your answer?
> 
> I hope so. :)
> 
>> When it's up to every coder, to insert explicit synchronization 
>> whenever required, how to determine the places where explicit code is 
>> required?
> 
> By careful analysis. Although there may exist tools which detect 
> potentially
> un-synchronised accesses to shared variables, there will be no tool that
> inserts synchronisation code automatically for you.

I wouldn't like such tools, except the compiler itself :-(

Consider the shareable bi-linked list, where insertion requires code 
like this:
   list.Lock; //prevent concurrent access
   ... //determine affected list elements
   new.prev := prev; //prev must be guaranteed to be valid
   new.next := next;
   prev.next := new;
   next.prev := new;
   list.Unlock;
What can we expect from the Lock method/instruction - what kind of 
synchronizaton (memory barrier) can, will or should it provide?

My understanding of a *full* cache synchronization would slow down not 
only the current core and cache, but also all other caches?

If so, would it help to enclose above instructions in e.g.
   Synchronized begin
     update the links...
   end;
so that the compiler can make all memory references (at least reads) 
occur read/write-through, inside such a code block? Eventually a global 
cache sync can be inserted on exit from such a block.

After these considerations I'd understand that using Interlocked 
instructions in the code would ensure such read/write-through, but 
merely as a side effect - they also lock the bus for every instruction, 
what's not required when concurrent access has been excluded by other 
means before.

Conclusion:

We need a documentation of the FPC specific means of cache 
synchronization, with their guaranteed effects on every target[1].

Furthermore we need concrete examples[2], how (to what extent) it's 
required to use these special instructions/procedures, in examples like 
above. When cache synchronization is a big issue, then the usage of 
related (thread-unaware) objects should be discussed as well, i.e. how 
to ensure that their use will cause no trouble, e.g. by invalidating the 
entire cache before.

[1] When the effects of the "primitíves" vary amongst targets, then 
either more specific documentation is required, or higher level 
target-insensitive procedures with a guaranteed behaviour. Eventually 
both, so that the experienced coder can use conditional code for the 
different handling on various targets.

[2] References to e.g. C code samples IMO is inappropriate, because the 
user cannot know what special handling such a different compiler will 
apply to its compiled code (see "volatile").

DoDi