interlocked commands [Re: [fpc-devel] LockFree Queue algorithm]
fpc at mfriebe.de
Mon Jan 28 00:30:26 CET 2008
While watching this thread, I started to ask myself 2 questions (which
may be related):
They just came to mind a multi-core systems where mentioned, possible
even systems with several CPUs.
(Admiringly they are more looking like they should be on an
intel-mailing list, I just hope someone may know...)
procedure tFLQueue.setObject(lp : integer;const aobject : tNodeQueue);
tab[lp and fMask]:=aObject;
The index ("lp and fMask") has been derived via "interlockedIncrement",
and the surrounding code makes sure, that only one thread will access
this value at this time.
But lets assume the value was read immediately before, by another
core/cpu. It therefore is in that core/cpu's cache. Will this cache be
invalidated/updated by a *simple* write to memory? Or will the other
core/cpu see the old value from it's cache?
I am no expert on this, but from the page referred below, "lock"ed
cpu-instructions, take special care of this.
I don't know about unlocked instructions?
> Because frequently used memory locations are often
> cached in a processor's L1 or L2 caches, atomic operations can often
> be carried out
> inside a processor's caches without asserting the bus lock. Here the
> cache coherency protocols insure that other processors that are
> caching the same
> memory locations are managed properly while atomic operations are
> performed on
> cached memory locations.
"caching the same memory locations are managed properly while atomic
operations are performed"
What does the cache coherency do (if anything) while non atomic
operations are performed?
I found various references that interlockedIncrement and co, work only
on 32 bit bounded data? This may or may not vary on the CPU.
The Intel doc only says, it will affect execution time, but looking at
the MS doc http://msdn2.microsoft.com/en-us/library/ms683614.aspx it
says it must be on a 32bit boundary.
Does that affect FPC? ( as there may be none intel CPUs?)
If so, then the Implementation of the Queue would have to ensure
alignment (as I believe fpc, aligns integer on 16 bit?)
Florian Klaempfl wrote:
> DarekM schrieb:
>> Martin Friebe pisze:
>>> You will need to test it, but the following may also work
>>> procedure tFlQueue.push(const tm : tNodeQueue);
>>> newTail : integer;
>>> newTemp := temp;
>>> while newTemp >= fsize begin
>>> // if it is true, every thread is going to attempt to fix it,
>>> before doing the increment.
>>> // this means no thread will increase the value
>>> // => one thread will to succeed (since the only reason "temp NE
>>> newTemp" is that temp has been decreased)
>>> // newTemp is bigger than fsize, so the result can not become
>>> newTemp := temp;
>> newTemp:=interlockedIncrement(temp) mod fsize; // make sure we are
>> always in range
>>> if lastTail < 0 then lastTail := fsize -1;
>>> setObject(lastTail,tm); // you can remove the "mod" in setobject
>>> until (newTail=lastTail);
>> It seems ok, but then we have 2 IF more.
> An if is unimportant, more important is the number of locked operations,
> especially on multi core systems they might eat hundreds of clock cycles.
> fpc-devel maillist - fpc-devel at lists.freepascal.org
More information about the fpc-devel