[fpc-pascal] How create a full text search with TChmWriter?

Mattias Gaertner nc-gaertnma at netcologne.de
Wed Feb 22 08:30:23 CET 2012


On Tue, 21 Feb 2012 22:25:08 -0500
Andrew Haines <AndrewD207 at aol.com> wrote:

> On 02/21/12 17:40, Mattias Gaertner wrote:
> > On Tue, 21 Feb 2012 16:08:43 +0100 (CET)
> > Mattias Gaertner <nc-gaertnma at netcologne.de> wrote:
>[...]
> > But it only finds whole words. :-
> > And clicking on a page gives a black page in lhelp. :(
> 
> The whole words is how the words are indexed. It would be fairly easy to
> match a partial word against the beginning of an indexed word. Beyond
> that if you want to find  "here" in "there" then you would have to dump
> the search index and create a second search index -> ugly
> > 
> > Processing all files required 12 minutes and terrifying 4GB ram. :(
> > Then comes some final part and it needed 9GB. I only have 8 so it
> > became very slow. :(
> 
> "terrifying 4GB ram." :) That made me laugh :)
> 
> The memory usage is significantly changed by generating a search index?

Yes sir.
Without search: 8 minutes and 600MB.

 
> > Then it went down to 5GB.
> > Finally it crashed with an AV, just like with the LCL chm.
> > And I have no chm.
> > 
> > Maybe some 64bit issue?
> 
> I had no crash .

Should I try 2.6.1?

 
> I made an artificial chm file that contained the same file with a
> different name 4000 times.
> 
> the html file was 13k bytes x 4000 (around 52 mb)
> 
> the chm was 2.9 mb
> 
> (I enabled LZX_USE_THREADS in chmwriter.pas)
> 
> time project1
> 
> real	10m50.497s
> user	36m9.276s
> sys	0m3.459s

Ehm, 10 Minutes for 52MB of text is still a lot, isn't it?

 
> According to top I used ~320mb of memory
> 
> I guess my chm does not have enough unique words and this is why the
> memory usage is so low.


Mattias



More information about the fpc-pascal mailing list