[fpc-pascal] Re: Text scan in text files - (was: Full text scan - PDF files)

Marcos Douglas md at delfire.net
Mon Nov 1 19:34:49 CET 2010


On Mon, Nov 1, 2010 at 3:31 PM, Tomas Hajny <XHajT03 at hajny.biz> wrote:
> On Mon, November 1, 2010 19:10, Marco van de Voort wrote:
>> In our previous episode, Marcos Douglas said:
>>> <albertonarduzzi at yahoo.com> wrote:
>>> >> Somebody can help me please?
>>> >> I need to search strings in Text files using just FPC.
>>> >
>>> > how about reading every line and then using Pos() to see if some
>>> string is
>>> > there?
>>> >
>>>
>>> I don't think this way is the fast way   :(
>>> I have many PDF files with several pages each.
>>
>> You'll be surprised. I've done multi million line logfiles that way. A
>> pdf2txt is infinitely slow compared with such processing.
>
> Well, there at least two gotchas there. First, it's better to use a
> reasonable (= large enough) buffer size. Second, the simplest approach
> implying reading line by line and searching using Pos() obviously isn't
> sufficient for searching across line breaks, i.e. you either need to
> handle that yourself, or use some unit providing such functionality.

Which unit do you recommends?

Marcos Douglas



More information about the fpc-pascal mailing list