[fpc-pascal] Hashmap for integers
Juha Manninen (gmail)
juha.manninen62 at gmail.com
Sun Aug 22 18:59:22 CEST 2010
On Sunday 22 August 2010 18:51:21 Andreas Schneider wrote:
> Afaik, inserting items into a sorted List is done via InsertSort, which is
> the fastest way (O(n))to go (IMHO), so you do /not/ have to resort the
> list.
True. I could optimize the current implementation a little. Now I insert and
then sort. However, inserting means moving lots of data which is slow, and on
the other hand quicksort is very fast when data is "almost sorted" already.
I would not expect any big speed-up there.
> I am not really sure, how else you would implement a hash map ...
> internally you always have to have some list like structure, be it an array
> or a linked list. A tree wouldn't be really fast for lots of inserts, since
> rebalancing a tree isn't that cheap either.
> Ok, sure, you could have a huge array of buckets that are either assigned
> or not, but IMHO that would waste too much memory or would have too much
> overlapping items/hashes.
"waste too much memory" ... No, hash arrays are always implemented as an array
of buckets. The only drawback is that you must know about the size of your
data in advance. Otherwise the array is too big and wastes memory or it is too
small and thus slow because of overlapping items/hashes.
I can tell about my experience of a string hashmap with Delphi.
I had a reporting feature in my application. The huge amount of data was read
from an in-memory tree-DB which is very fast and didn't slow things down. (I
plan to publish the related component soon...)
Due to a complex data structure there were duplicate lines in the report. It
was not enough to check for duplicates in 1 record, I had to check the whole
generated report line. I used TStringList for that. The report took > 20
minutes to generate... I optimized, pre-calculated things... the time dropped
to 8 minutes. Not bad.
I realized most time was spent checking for duplicate report lines. I figured
by using a hash map I could cut that time maybe to half, to 4 minutes (!).
I used the class StringHashMap which I later extracted and ported to FPC and
published:
http://wiki.freepascal.org/StringHashMap
It seems to be a very fast implementation of a string hash map.
After that the report was generated in 6 seconds (not minutes, seconds!).
I was looking at the memory consumption. According to Windows resource diagram
850 MB mem was used with the old version. With the new version it was maybe
860 MB. Who cares really... Total mem was 1 GB.
> Also I think "map" is the right term for what the TFPGMap does, even if it
> inherits from a list. A mapping is simply a relation from some key to some
> data, and thats exactly what TFPGMap does.
True, the name "map" does not define the implementation. But still, most often
a "map" is implemented as a hashmap. I was surprised that it is not so in this
case. I felt almost like cheated in fact ... sorry :-)
Juha
More information about the fpc-pascal
mailing list