[fpc-pascal] Re: Text scan in text files - (was: Full text scan - PDF files)

Tomas Hajny XHajT03 at hajny.biz
Mon Nov 1 19:31:31 CET 2010


On Mon, November 1, 2010 19:10, Marco van de Voort wrote:
> In our previous episode, Marcos Douglas said:
>> <albertonarduzzi at yahoo.com> wrote:
>> >> Somebody can help me please?
>> >> I need to search strings in Text files using just FPC.
>> >
>> > how about reading every line and then using Pos() to see if some
>> string is
>> > there?
>> >
>>
>> I don't think this way is the fast way   :(
>> I have many PDF files with several pages each.
>
> You'll be surprised. I've done multi million line logfiles that way. A
> pdf2txt is infinitely slow compared with such processing.

Well, there at least two gotchas there. First, it's better to use a
reasonable (= large enough) buffer size. Second, the simplest approach
implying reading line by line and searching using Pos() obviously isn't
sufficient for searching across line breaks, i.e. you either need to
handle that yourself, or use some unit providing such functionality.

Tomas





More information about the fpc-pascal mailing list