[fpc-devel] Generic Programming Units

Wed Jun 22 14:03:04 CEST 2005

Michael Van Canneyt wrote:

> What is the performance difference between a hash() and a binary search on
> an ordered list ?

IMO perfomance heavily depends on the use of the lists. An ordered list
requires some efforts when entering or removing items, and typically
requires more comparisons than a hashed list. Access to a hashed list,
with a reasonable hash table size, requires not much more than one
comparison, whereas an ordered list requires log(n) comparisons.

A hash algorithm can be very simple, in detail for generic hash tables
with no constraints on the distribution of the element keys. For
collission handling I prefer a linked list of the affected elements,
over re-hashing.

> I've also been working with an 'associative' stringlist, but
> I was using an ordered stringlist to keep the data, so a binary search is done.

When the list elements are stored by reference, it's possible to
maintain any number of related tables (indices, hash tables). When a
sorted list is already required for the elements, an (additional) hash
table is optional.

As long as the same list interface is implemented for all access models,
it's quite easy to exchange the implementation in order to find out the
best performant model for every specific case. The comparison functions
should already be exchangeable, according to the nature of the key
fields, and the hash algorithms must be exchangeable for the same
reason. In so far I wouldn't restrict a library to one specific
implementation model for every container type.

In my parser symbol table I add all symbols in creation order to the
table, and maintain a hash table for quick access. The hash table grows
together with the number of elements, what requires a reorganization of
the hash table whenever the symbol list is extended, so that a
reasonable starting capacity and increment should be choosen; I double
the capacity on every extension. Elements of the same name, but in
different scopes, are entered into the collission list of the hash
table, again in creation order. This simplifies access to the newest
symbols, in the current scope, and also simplifies the removal of a
complete scope from the symbol and hash tables. Please note that
duplicate keys require special efforts in sorted lists, when the
creation order of the elements, or any other secondary sorting
criterium, is significant.

DoDi