[fpc-pascal] Text scan in text files - (was: Full text scan - PDF files)
José Mejuto
joshyfun at gmail.com
Tue Nov 2 14:38:56 CET 2010
Hello FPC-Pascal,
Tuesday, November 2, 2010, 11:02:18 AM, you wrote:
TH> If I understand it correctly, this assumes reading the whole file into
TH> memory at once. Depending on the size of that file and other conditions,
TH> this may or may not be advisable...
Yes, and a pdf2text conversion will reduce the PDF file to a 1% of its
original size, so unless you handle 10 gigabyte PDFs should be not
problem in loading the whole file in memory.
I doubt that there are memory problems as running pdf2text will for
sure consume more memory that the result file size.
Of course if you will end up with 300 megabytes txt files then a
different approach would be needed using a buffer with a window over
the size of the searched text.
Also logic will be different if you would like to match one word,
several words, large sentences, sequeces of chars, etc.
--
Best regards,
José
More information about the fpc-pascal
mailing list