[fpc-pascal] Fast HTML Parser
Mark Morgan Lloyd
markMLl.fpc-pascal at telemetry.co.uk
Wed Aug 6 22:46:17 CEST 2014
Marcos Douglas wrote:
> On Wed, Aug 6, 2014 at 2:54 PM, Rainer Stratmann
> <rainerstratmann at t-online.de> wrote:
>> On Wednesday 06 August 2014 19:50:44 you wrote:
>>> Hi,
>>>
>>> Someone knows a fast html parser to use in Pascal code?
>>>
>>> I need something like this:
>>>
>>> HTML:
>>> <select name="sel_x">
>>> <option>1</option>
>>> <option>2</option>
>>> </select>
>>>
>>> I need a function/object to give me only the values:
>>> 1
>>> 2
>>>
>>> Something like:
>>> S := GetHTMLValues('sel_x');
>> It's not that difficult to write yourself.
>
> You're right. But I'm searching the faster HTML parser to use in huge
> HTML files... thousands of files.
I disagree: it's damn difficult if one isn't working with tightly
constrained input, and the original question says HTML without
specifying it's a subset.
There's a couple of places where I parse HTML files that I've created
myself, i.e. I know exactly what's in them, using- basically- a simple
recursive-descent parser with some rather flexible ideas about comments
(i.e. in the above example, name="sel_x" could be lost as a comment).
However if I'm doing a brute-force job over a large number of files I
usually use Lynx as a preprocessor, which allows me to use standard
text-processing utilities to pull named rows out of tabulated reports.
--
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk
[Opinions above are the author's, not those of his employers or colleagues]
More information about the fpc-pascal
mailing list