interlocked commands [Re: [fpc-devel] LockFree Queue algorithm]

Mon Jan 28 00:30:26 CET 2008

While watching this thread, I started to ask myself 2 questions (which 
may be related):

They just came to mind a multi-core systems where mentioned, possible 
even systems with several CPUs.
(Admiringly they are more looking like they should be on an 
intel-mailing list, I just hope someone may know...)

1)
procedure tFLQueue.setObject(lp : integer;const aobject : tNodeQueue);
begin
tab[lp and fMask]:=aObject;
end;

The index ("lp and fMask") has been derived via "interlockedIncrement", 
and the surrounding code makes sure, that only one thread will access 
this value at this time.

But lets assume the value was read immediately before, by another 
core/cpu. It therefore is in that core/cpu's cache. Will this cache be 
invalidated/updated by a *simple* write to memory? Or will the other 
core/cpu see the old value from it's cache?
I am no expert on this, but from the page referred below, "lock"ed 
cpu-instructions, take special care of this.
I don't know about unlocked instructions?

http://static.scribd.com/docs/59o7jahfstz7r.swf?INITIAL_VIEW=width
chapter 7
> Because frequently used memory locations are often
> cached in a processor's L1 or L2 caches, atomic operations can often 
> be carried out
> inside a processor's caches without asserting the bus lock. Here the 
> processor's
> cache coherency protocols insure that other processors that are 
> caching the same
> memory locations are managed properly while atomic operations are 
> performed on
> cached memory locations.
"caching the same memory locations are managed properly while atomic 
operations are performed"
What does the cache coherency do (if anything) while non atomic 
operations are performed?

2)
I found various references that interlockedIncrement and co, work only 
on 32 bit bounded data? This may or may not vary on the CPU.
The Intel doc only says, it will affect execution time, but looking at 
the MS doc http://msdn2.microsoft.com/en-us/library/ms683614.aspx it 
says it must be on a 32bit boundary.
Does that affect FPC? ( as there may be none intel CPUs?)

If so, then the Implementation of the Queue would have to ensure 
alignment (as I believe fpc, aligns integer on 16 bit?)

Martin

Florian Klaempfl wrote:
> DarekM schrieb:
>   
>> Martin Friebe pisze:
>>     
>>> You will need to test it, but the following may also work
>>>
>>> procedure tFlQueue.push(const tm : tNodeQueue);
>>> var
>>>  newTemp,
>>>  lastTail,
>>>  newTail : integer;
>>> begin
>>>  newTemp := temp;
>>>  while newTemp >= fsize begin
>>>    // if it is true, every thread is going to attempt to fix it,
>>> before doing the increment.
>>>    // this means no thread will increase the value
>>>    //  => one thread will to succeed (since the only reason "temp NE
>>> newTemp" is that temp has been decreased)
>>>    // newTemp is bigger than fsize, so the result can not become
>>> negative.
>>>    
>>> interlockedCompareExchange(pointer(temp),pointer(newTemp-fsize),pointer(newTemp));
>>>
>>>     newTemp := temp;
>>>  end;
>>>       
>> newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are
>> always in range
>>     
>>>  lastTail:=newTemp-1;
>>>  if lastTail < 0 then lastTail := fsize -1;
>>>  setObject(lastTail,tm); // you can remove the "mod" in setobject
>>>  repeat
>>>   
>>> pointer(newTail):=interlockedCompareExchange(pointer(tail),pointer(newTemp),pointer(lastTail));
>>>
>>>  until (newTail=lastTail);
>>> end;
>>>
>>>       
>> It seems ok, but then we have 2 IF more.
>>     
>
> An if is unimportant, more important is the number of locked operations,
> especially on multi core systems they might eat hundreds of clock cycles.
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel
>