[fpc-pascal] Fast HTML Parser

Wed Aug 6 22:54:28 CEST 2014

On Wed, Aug 6, 2014 at 5:46 PM, Mark Morgan Lloyd
<markMLl.fpc-pascal at telemetry.co.uk> wrote:
> Marcos Douglas wrote:
>>
>> On Wed, Aug 6, 2014 at 2:54 PM, Rainer Stratmann
>> <rainerstratmann at t-online.de> wrote:
>>>
>>>  On Wednesday 06 August 2014 19:50:44 you wrote:
>>>>
>>>> Hi,
>>>>
>>>> Someone knows a fast html parser to use in Pascal code?
>>>>
>>>> I need something like this:
>>>>
>>>> HTML:
>>>> <select name="sel_x">
>>>> <option>1</option>
>>>> <option>2</option>
>>>> </select>
>>>>
>>>> I need a function/object to give me only the values:
>>>> 1
>>>> 2
>>>>
>>>> Something like:
>>>> S := GetHTMLValues('sel_x');
>>>
>>> It's not that difficult to write yourself.
>>
>>
>> You're right. But I'm searching the faster HTML parser to use in huge
>> HTML files... thousands of files.
>
>
> I disagree: it's damn difficult if one isn't working with tightly
> constrained input, and the original question says HTML without specifying
> it's a subset.
>
> There's a couple of places where I parse HTML files that I've created
> myself, i.e. I know exactly what's in them, using- basically- a simple
> recursive-descent parser with some rather flexible ideas about comments
> (i.e. in the above example, name="sel_x" could be lost as a comment).
> However if I'm doing a brute-force job over a large number of files I
> usually use Lynx as a preprocessor, which allows me to use standard
> text-processing utilities to pull named rows out of tabulated reports.

I know the tokens to search, but the HTML could be very different each other.
I can't use a external tool. Need to be a application (that already exists).

Thanks,
Marcos Douglas