[fpc-pascal] PDF indexing

Wed Jun 24 08:13:04 CEST 2015

On Wed, 24 Jun 2015, Marc Santhoff wrote:

> On Di, 2015-06-23 at 09:10 +0200, Michael Van Canneyt wrote:
>>
>> On Tue, 23 Jun 2015, Marc Santhoff wrote:
>>
>>> On So, 2015-06-21 at 00:33 +0200, Michael Van Canneyt wrote:
>>>>
>>>> On Sat, 20 Jun 2015, Marc Santhoff wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> does fpc (or lazarus) have a helper class for indexing the content of
>>>>> PDF files?
>>>>
>>>> check packages/fpindexer
>>>>
>>>> I have used it to create full text searches on a database.
>>>> You should be able to adapt the base code to create an index of a PDF.
>>>
>>> That looks pretty intresting. And it has some docs, wow.
>>>
>>> If I understand correctly I'd only have to implement a class TIReaderPDF
>>> and the difference to simple text reading is the part that extracts a
>>> text stream or the text parts of the stream rejecting the pdf commands
>>> (if they are in there, need to look at PowerPDF).
>>
>> Yes, that would be correct.
>
> Many thanks, Michael.
>
> Currently I'm searching a pdf access library that could help doing so.
> The only one halfway fitting up to now is this one:
>
> http://itextpdf.com/functionality
>
> Open Source but a license similar to LGPL without exception. Still
> searching ...

But Java or .Net.

Depending on your platform you may attempt gnostice products. It's Delphi code,
but they are quite open and I was told the upcoming rework of their products 
will make support for Lazarus possible.

Michael.