[fpc-devel] LockFree Queue algorithm

Wed Jan 30 23:36:29 CET 2008

Florian Klaempfl wrote:
> Michael Schnell schrieb:
>> 
> 
> The point is, it simply affects all processor. Its much better than
> an entercriticalsection but it is not only twice the time of a simply
> inc or whatever.

I want two give my two cent's too.

In the last 4 years, we developed a billing application
(fpc/linux/daemons) for VOIP calls.
The application bills in realtime cdr records generated from the telko
switch.
In one hour there are up to 15000 records to bill. In one month there
are estim. 5 million 
billed records in the database. (You must process estimately 7 Events, 
Start Call Event, Stop Call Event, Intermediate Timestamps to get one of
the 5 million CDR's).
(CDR := http://en.wikipedia.org/wiki/Call_detail_record)

In short: We must process a huge amount of data fast...

The application must find start and stop events in 7 different inmemory
data structures.
We use a multithreaded concept. A database reader, mediator threads
(finding start and stop)
,guiding threads (finding customers to call-id's),a rating threads
(making the price splitting a long the
time and destination axis), a db writing thread (writing results), a
fraud detection thread (checking per customer limits).
All threads use the same inmemory data structures.

The overall speed of the application is critical, as it makes a
difference if a re(rating) of a
month for 10000 Telko customers producing 10.000.000 of CDR's on 10000
phone bills takes 6 hours or
6 Weeks :-) (On a simple Multicore System)

We were (re)implemented the thing 3 times. The code is around 100.000
Lines.

Our conclusions for COMPLEX Algorithms are: (We read more often then we
write)

Involved Locking Structures:

A)-> TSemaphores / Critical Sections
Slow:-> Cause slow Writers block Readers too much
Deadlock Bugs -> if wrong Locking order -> Hard to debug.
Priority Inversion: High Priority operations gets blocked due to lock
held by
                    low prio thread. (Even harder to debug) 

B)->Multireader, Single Writer Locks
Faster:-> Due to the Multi Reader, Single Writer Locks
Much more Deadlock(!) Bugs, due to much more Readers.
Priority Inversion: High Priority operations gets blocked due to lock
held by
                    low prio thread. 

C) Lockfree Structures
No Deadlocks possible, No priority inversion,
No needless Waiting on Semaphores, if the Scheduler puts a Thread out of
the execution path
while in a Critical Section or holding a Lock.

Currently we have B) Running in Real World and C) Running in Testing.

C) Is not fully implemented due to large algorithmic changes. But in
prototypes and testing
Case C) is very promising because it is several times faster (not
serveral percents) than
Cases A and B).

The time spent on debugging A) and B) was enormous.

I think we all could need a good lockfree datastructure library for
Freepascal.
Our current implementation is not suitable as it don't cares for
multiplatform, and
64 Bit architectures and is too tightly bound with our internals.

But if someone is interested, we could work together to make something
new.
At least i have some links as starting points.

BTW: 
We are searching for a FPC/Delphi Programmer willing to work here in our
small team
in Graz/Austria.

helmut