[fpc-pascal] Hashmap for integers

Sun Aug 22 18:59:22 CEST 2010

On Sunday 22 August 2010 18:51:21 Andreas Schneider wrote:
> Afaik, inserting items into a sorted List is done via InsertSort, which is
> the fastest way (O(n))to go (IMHO), so you do /not/ have to resort the
> list.

True. I could optimize the current implementation a little. Now I insert and 
then sort. However, inserting means moving lots of data which is slow, and on 
the other hand quicksort is very fast when data is "almost sorted" already.
I would not expect any big speed-up there.

> I am not really sure, how else you would implement a hash map ...
> internally you always have to have some list like structure, be it an array
> or a linked list. A tree wouldn't be really fast for lots of inserts, since
> rebalancing a tree isn't that cheap either.
> Ok, sure, you could have a huge array of buckets that are either assigned
> or not, but IMHO that would waste too much memory or would have too much
> overlapping items/hashes.

"waste too much memory" ... No, hash arrays are always implemented as an array 
of buckets. The only drawback is that you must know about the size of your 
data in advance. Otherwise the array is too big and wastes memory or it is too 
small and thus slow because of overlapping items/hashes.

I can tell about my experience of a string hashmap with Delphi.
I had a reporting feature in my application. The huge amount of data was read 
from an in-memory tree-DB which is very fast and didn't slow things down. (I 
plan to publish the related component soon...)
Due to a complex data structure there were duplicate lines in the report. It 
was not enough to check for duplicates in 1 record, I had to check the whole 
generated report line. I used TStringList for that. The report took > 20 
minutes to generate... I optimized, pre-calculated things... the time dropped 
to 8 minutes. Not bad.
I realized most time was spent checking for duplicate report lines. I figured 
by using a hash map I could cut that time maybe to half, to 4 minutes (!).

I used the class StringHashMap which I later extracted and ported to FPC and 
published:
  http://wiki.freepascal.org/StringHashMap
It seems to be a very fast implementation of a string hash map.

After that the report was generated in 6 seconds (not minutes, seconds!).
I was looking at the memory consumption. According to Windows resource diagram 
850 MB mem was used with the old version. With the new version it was maybe 
860 MB. Who cares really... Total mem was 1 GB.

> Also I think "map" is the right term for what the TFPGMap does, even if it
> inherits from a list. A mapping is simply a relation from some key to some
> data, and thats exactly what TFPGMap does.

True, the name "map" does not define the implementation. But still, most often 
a "map" is implemented as a hashmap. I was surprised that it is not so in this 
case. I felt almost like cheated in fact ... sorry :-)

Juha