[fpc-devel] TFPHashList (Was: Alternative parsers)

Wed Oct 20 09:39:04 CEST 2010

On Wed, Oct 20, 2010 at 05:57, Sergei Gorelkin <sergei_gorelkin at mail.ru> wrote:
> Michael Van Canneyt пишет:
>>
>>> I believe it's not ShortStrings per se to blame, but storing them in a
>>> large single memory block as it is being done in TFPHashList. Adding a lot
>>> of keys will cause many reallocations. Large size of the block forces memory
>>> manager to use variable-size pool, which is less efficient than using
>>> fixed-size blocks in case of small chunks.
>>
>> Strange. The whole idea of the implementation was to improve memory
>> efficiency.
>> And at the time, it was definitely and consistently faster.
>> If Alexander sends me his code, I will check, and then I should ask Peter
>> Vreman, who wrote the code. (if he'll still answer).

Memory efficiency is a tricky notion.
True, shortstring TFPHashList stores the keys very efficiently,
but the ansistring variant may probably
avoid storing them at all due to refcounting.
So, TFPHashList may plausibly end up both taking more memory
(due to key duplication) _and_ more time (due to key copying).

> It is indeed very efficient regarding number of bytes occupied. Storing all
> strings in one block has no storage overhead at all.
> Stressing this implementation with huge amount of Add's won't show the best
> speed, but it has little common with the intended usage scenario, which is
> "single Add, multiple Find's". Also in the compiler, the average number of
> items in a single TFPHashList is not huge. It can grow huge in units like
> Windows (many procedures in one namespace), but these are more like extreme
> cases.

Ok, I re-tested with 2*10^6 Find's. Then AnsiString variants wins even
for length 5:

ShortString: 0.547 s
AnsiString: 0.437 s

-- 
Alexander S. Klenin