AW: Re: [fpc-pascal] "Generics" Red Black Tree for fpc

Fri Apr 3 23:33:28 CEST 2009

Von: Mattias Gaertner <nc-gaertnma at netcologne.de>
Gesendet: Fr 03.04.2009 18:07
An: fpc-pascal at lists.freepascal.org; 
Betreff: Re: [fpc-pascal] "Generics" Red Black Tree for fpc

> On Fri, 3 Apr 2009 17:16:50 +0200
> Helmut Hartl <helmut.hartl at firmos.at> wrote:
> 
> > Von: Mattias Gaertner <nc-gaertnma at netcologne.de>
> > Gesendet: Fr 03.04.2009 16:51
> > An: fpc-pascal at lists.freepascal.org; 
> > Betreff: Re: [fpc-pascal] "Generics" Red Black Tree for fpc
> > 
> > > How much work do you think is it to extend it to accept duplicate
> > > keys? Mattias
> > 
> > How probable are duplicate keys in your usecase? / what is the use
> > case ? It would be easy to add them as list per key node natively,
> > but that would be possible with the current version too if you use a
> > listtype as the storage type ...
> 
> Of course duplicates are very unlikely, but chance is not 0. Approx 80%
> of my trees must therefore support duplicates.
> Using lists in the nodes would cost speed and memory (duplicates are
> seldom), so I would prefer a more direct support.
> 
> Mattias

I can think of two possible ways of allowing duplicates:

a) the internal node structure gets another pointer for the next duplicated storage value. 
(single linked list). The tree structure does not change. If duplicates are not very
probable the linked list is sufficiently small to not give a huge performance penalty.
The order of access would then be log(n)+k, where k ist the amount of duplicates.
We have one pointer more per node. The time to scan the whole tree linear would be n*log(n).
The tree semantics for a random access with duplicates would not be very clean: 
1) first scan for a key
2) check if duplicates exist
3) scan the duplicate list
Implementation would be easy, memory cost just one additional pointer, maybe the duplicate count in the node.

b)
We add 2 pointers in double linked list fashion to the tree node structure and link each node against its pred and next node. If we insert a duplicate we do it only in the list, not in the tree structure.
We could then lineary scan the tree faster in the order of n, by traversing the list and rembering the last element - and in log(n) for a random access. 
Semantics for the access would be cleaner.
1) Search the key
2) call next until the key changes

Background Info:
This tree + semantic is designed for heavy multiprocessor/multithreading loads, therefor the
interface must be kept as simple as possible. If you have multiple threads reading and writing concurrently on the tree, one thread(A) can search the tree by only rembering the active key, 
while two other thread (C,D) may delete or insert nodes. If thread A gets interupted in the scan then he can continoue with his last key value and finds the next node, making progress.
If thread A would remember a pointer to the node, he could search the tree in order n (not the described n*log(n) but has to deal with deleted nodes, ABA-Problems and much headache - or the tree (as directory) must be locked as a whole while processing a linear scan. (which is practically two inefficient and slow).

Which means as conclusion that the speed benefit of solution b) will vanish.
A solution to the whole cause could be a STM (software transactional memory) which i currently 
work on - but thats far away from production quality ...

So i would propose solution A) with the following semantics for a random acess - if the
tree would be intended for multithreaded usage - for single threaded usage i would propose solution B).

A)
1) Search the key with a function:
function Find(const key:_TKey;out node:_TStore;const DuplicateIterator:TIteratorFunc):Boolean;
where you pass an iteratorfunction which gets called for every extra duplicate.

The multithreaded access could then be:
1) Lock the directory (r/b tree) 
2) Find the key and fetch the objects and duplicates, lock them
3) unlock the tree
4) process the objects/data

i hope i was somewhat clear :-) 

If it is of value i would implement a solution.

 helmut