[fpc-pascal] Re: Text scan in text files - (was: Full text scan - PDF files)

Marcos Douglas md at delfire.net
Mon Nov 1 19:54:28 CET 2010


On Mon, Nov 1, 2010 at 3:50 PM, Tomas Hajny <XHajT03 at hajny.biz> wrote:
> On Mon, November 1, 2010 19:42, Marcos Douglas wrote:
>> On Mon, Nov 1, 2010 at 3:36 PM, Tomas Hajny <XHajT03 at hajny.biz> wrote:
>>> On Mon, November 1, 2010 19:20, Marcos Douglas wrote:
>>>> On Mon, Nov 1, 2010 at 3:10 PM, Marco van de Voort <marcov at stack.nl>
>>>> wrote:
>>>>> In our previous episode, Marcos Douglas said:
>>>>>> <albertonarduzzi at yahoo.com> wrote:
>>>>>> >> Somebody can help me please?
>>>>>> >> I need to search strings in Text files using just FPC.
>>>>>> >
>>>>>> > how about reading every line and then using Pos() to see if some
>>>>>> string is
>>>>>> > there?
>>>>>> >
>>>>>>
>>>>>> I don't think this way is the fast way   :(
>>>>>> I have many PDF files with several pages each.
>>>>>
>>>>> You'll be surprised. I've done multi million line logfiles that way. A
>>>>> pdf2txt is infinitely slow compared with such processing.
>>>>
>>>> Ok, but I need to work with pure text. The PDF files are "compiled".
>>>
>>> Sorry, I'm confused now. Do you want to search for text strings within
>>> PDF
>>> files or within regular text files?
>>
>> I need to search for text strings in PDF files... but I don´t know do
>> that, so I use the pdf2txt[1] to convert in pure text.
>> If there is a unit that can search for text within a PDF file, would
>> be the best solution for me.
>
> I see, but that means that Marco's comment applies - conversion of PDF
> files to pure text will probably require more time than finding your
> string within the (converted) text file.

I agree. But as I search for text within PDF files?

Marcos Douglas



More information about the fpc-pascal mailing list