[fpc-pascal] PDF indexing

Marc Santhoff M.Santhoff at web.de
Wed Jun 24 08:47:20 CEST 2015


On Mi, 2015-06-24 at 08:13 +0200, Michael Van Canneyt wrote:
> 
> On Wed, 24 Jun 2015, Marc Santhoff wrote:
> 
> > On Di, 2015-06-23 at 09:10 +0200, Michael Van Canneyt wrote:
> >>
> >> On Tue, 23 Jun 2015, Marc Santhoff wrote:
> >>
> >>> On So, 2015-06-21 at 00:33 +0200, Michael Van Canneyt wrote:
> >>>>
> >>>> On Sat, 20 Jun 2015, Marc Santhoff wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> does fpc (or lazarus) have a helper class for indexing the content of
> >>>>> PDF files?
> >>>>
> >>>> check packages/fpindexer
> >>>>
> >>>> I have used it to create full text searches on a database.
> >>>> You should be able to adapt the base code to create an index of a PDF.
> >>>
> >>> That looks pretty intresting. And it has some docs, wow.
> >>>
> >>> If I understand correctly I'd only have to implement a class TIReaderPDF
> >>> and the difference to simple text reading is the part that extracts a
> >>> text stream or the text parts of the stream rejecting the pdf commands
> >>> (if they are in there, need to look at PowerPDF).
> >>
> >> Yes, that would be correct.
> >
> > Many thanks, Michael.
> >
> > Currently I'm searching a pdf access library that could help doing so.
> > The only one halfway fitting up to now is this one:
> >
> > http://itextpdf.com/functionality
> >
> > Open Source but a license similar to LGPL without exception. Still
> > searching ...
> 
> But Java or .Net.

Really? Oops, oversight. I thought there would be a C version.

> Depending on your platform you may attempt gnostice products. It's Delphi code,
> but they are quite open and I was told the upcoming rework of their products 
> will make support for Lazarus possible.

I'll have a look at that one. Pure Object Pascal would be very nice.

And I found poppler, which is a fork of xpdf compiling into a library.
Part of Gnome, so LGPL.

Another solution would be to use TProcess and an external tool, there
are some "pdf2txt".variants out there.

Thanks again,
Marc

-- 
Marc Santhoff <M.Santhoff at web.de>




More information about the fpc-pascal mailing list