[fpc-pascal] How create a full text search with TChmWriter?
Mattias Gaertner
nc-gaertnma at netcologne.de
Wed Feb 22 08:30:23 CET 2012
On Tue, 21 Feb 2012 22:25:08 -0500
Andrew Haines <AndrewD207 at aol.com> wrote:
> On 02/21/12 17:40, Mattias Gaertner wrote:
> > On Tue, 21 Feb 2012 16:08:43 +0100 (CET)
> > Mattias Gaertner <nc-gaertnma at netcologne.de> wrote:
>[...]
> > But it only finds whole words. :-
> > And clicking on a page gives a black page in lhelp. :(
>
> The whole words is how the words are indexed. It would be fairly easy to
> match a partial word against the beginning of an indexed word. Beyond
> that if you want to find "here" in "there" then you would have to dump
> the search index and create a second search index -> ugly
> >
> > Processing all files required 12 minutes and terrifying 4GB ram. :(
> > Then comes some final part and it needed 9GB. I only have 8 so it
> > became very slow. :(
>
> "terrifying 4GB ram." :) That made me laugh :)
>
> The memory usage is significantly changed by generating a search index?
Yes sir.
Without search: 8 minutes and 600MB.
> > Then it went down to 5GB.
> > Finally it crashed with an AV, just like with the LCL chm.
> > And I have no chm.
> >
> > Maybe some 64bit issue?
>
> I had no crash .
Should I try 2.6.1?
> I made an artificial chm file that contained the same file with a
> different name 4000 times.
>
> the html file was 13k bytes x 4000 (around 52 mb)
>
> the chm was 2.9 mb
>
> (I enabled LZX_USE_THREADS in chmwriter.pas)
>
> time project1
>
> real 10m50.497s
> user 36m9.276s
> sys 0m3.459s
Ehm, 10 Minutes for 52MB of text is still a lot, isn't it?
> According to top I used ~320mb of memory
>
> I guess my chm does not have enough unique words and this is why the
> memory usage is so low.
Mattias
More information about the fpc-pascal
mailing list