[fpc-pascal] Fast HTML Parser

Mark Morgan Lloyd markMLl.fpc-pascal at telemetry.co.uk
Wed Aug 6 22:46:17 CEST 2014


Marcos Douglas wrote:
> On Wed, Aug 6, 2014 at 2:54 PM, Rainer Stratmann
> <rainerstratmann at t-online.de> wrote:
>>  On Wednesday 06 August 2014 19:50:44 you wrote:
>>> Hi,
>>>
>>> Someone knows a fast html parser to use in Pascal code?
>>>
>>> I need something like this:
>>>
>>> HTML:
>>> <select name="sel_x">
>>> <option>1</option>
>>> <option>2</option>
>>> </select>
>>>
>>> I need a function/object to give me only the values:
>>> 1
>>> 2
>>>
>>> Something like:
>>> S := GetHTMLValues('sel_x');
>> It's not that difficult to write yourself.
> 
> You're right. But I'm searching the faster HTML parser to use in huge
> HTML files... thousands of files.

I disagree: it's damn difficult if one isn't working with tightly 
constrained input, and the original question says HTML without 
specifying it's a subset.

There's a couple of places where I parse HTML files that I've created 
myself, i.e. I know exactly what's in them, using- basically- a simple 
recursive-descent parser with some rather flexible ideas about comments 
(i.e. in the above example, name="sel_x" could be lost as a comment). 
However if I'm doing a brute-force job over a large number of files I 
usually use Lynx as a preprocessor, which allows me to use standard 
text-processing utilities to pull named rows out of tabulated reports.

-- 
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]



More information about the fpc-pascal mailing list