Marco van de Voort
marcov at stack.nl
Mon Mar 23 13:07:43 CET 2009
(maillist maintainer/jonas: I wrote a similar message from a non-subscribed
email addr. It can be discarded, sorry)
I needed a html parser, and am not in a hurry, so I decided to check FPC's
own first, in the hope that I can at least make some documentation in the
wiki /examples during the experience.
The first project is simple, see program below, executed on FPC's html
documentation. I noticed that it failed like this:
An unhandled exception occurred at $004284EC :
EDOMError : EDOMError in DOMDocument.CreateElement hr/0
$00411A86 THTMLTODOMCONVERTER__READERSTARTELEMENT, line 500 of
$0042648A TSAXREADER__DOSTARTELEMENT, line 738 of src/sax.pp
$004110DC THTMLREADER__ENTERNEWSCANNERCONTEXT, line 391 of
$00410C80 THTMLREADER__PARSE, line 358 of src/sax_html.pp
$0042612C TSAXREADER__PARSESTREAM, line 647 of src/sax.pp
$00411F3D READHTMLFILE, line 609 of src/sax_html.pp
$00411E91 READHTMLFILE, line 593 of src/sax_html.pp
$004015DE main, line 21 of saxattempt.dpr
Some debugging seems that it fails on <hr/>, doctype of the doc in question
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
Some questions for the more xmlable:
1. is this correct? I think <hr/> is more xml notation than html notation?
2. can I somehow convince (override) DOM to accept it? (since modifying the
generator (tex4ht) might prove to be difficult). It could be genera
3. Is there a way to have line numbers in the exceptions? Modifying the
source with writeln's to find out which tag exactly goes wrong is a bit
Note that I'm already happy with pointers where to start. Anybody willing to
share private examples or documentation would be great too.
sx : THTMLDocument;
if findfirst('*.html',faanyfile,d)=0 then
More information about the fpc-devel