[fpc-pascal] Text scan in text files - (was: Full text scan - PDF files)

José Mejuto joshyfun at gmail.com
Mon Nov 1 21:52:13 CET 2010


Hello FPC-Pascal,

Monday, November 1, 2010, 8:40:45 PM, you wrote:

MD> On Mon, Nov 1, 2010 at 4:27 PM, Alberto Narduzzi
MD> <albertonarduzzi at yahoo.com> wrote:
>> Sorry,
>>
>>> I agree. But as I search for text within PDF files?
>>
>> I assumed true the following statement of yours...
>>
>> [Somebody can help me please?
>> I need to search strings in Text files using just FPC.]

MD> Yes, I changed my first mail because nobody answered me about search
MD> in PDF files!

Search an string in a pdf directly is far away from trivial, you need
at least a parse.r and a decompressor. The decompressor is not a
problem usually as most files have streams compressed with zlib, but
the parser is very complex, locate the text zones is easy (more or
less) but if the strings have international characters, it implies the
use o CMAPs which is a nighmare.

So if you are looking for ASCII words, use PDF2Text and use the POS
function over the result:

function HaveString(Filename: String; TheString: string): Boolean;
var
  F: TFileStream;
  S: String;
  AtPos: integer;
begin
  Result:=false;
  F:=TFileStream.Create(Filename,fmOpenReadOnly);
  SetLength(S,F.Size);
  if F.Size>0 then
  begin
    F.ReadBuffer(S[1],F.Size);
    AtPos:=Pos(UpperCase(TheString),S);
    if AtPos>0 then
      Result:=true;
  end;
  F.Free;
end;

-- 
Best regards,
 José




More information about the fpc-pascal mailing list