[fpc-devel] fcl-xml
Marco van de Voort
marcov at stack.nl
Mon Mar 23 13:07:43 CET 2009
(maillist maintainer/jonas: I wrote a similar message from a non-subscribed
email addr. It can be discarded, sorry)
I needed a html parser, and am not in a hurry, so I decided to check FPC's
own first, in the hope that I can at least make some documentation in the
wiki /examples during the experience.
The first project is simple, see program below, executed on FPC's html
documentation. I noticed that it failed like this:
An unhandled exception occurred at $004284EC :
EDOMError : EDOMError in DOMDocument.CreateElement hr/0
$004284EC
$00411A86 THTMLTODOMCONVERTER__READERSTARTELEMENT, line 500 of
src/sax_html.pp
$0042648A TSAXREADER__DOSTARTELEMENT, line 738 of src/sax.pp
$004110DC THTMLREADER__ENTERNEWSCANNERCONTEXT, line 391 of
src/sax_html.pp
$00410C80 THTMLREADER__PARSE, line 358 of src/sax_html.pp
$0042612C TSAXREADER__PARSESTREAM, line 647 of src/sax.pp
$00411F3D READHTMLFILE, line 609 of src/sax_html.pp
$00411E91 READHTMLFILE, line 593 of src/sax_html.pp
$004015DE main, line 21 of saxattempt.dpr
Some debugging seems that it fails on <hr/>, doctype of the doc in question
is
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
Some questions for the more xmlable:
1. is this correct? I think <hr/> is more xml notation than html notation?
2. can I somehow convince (override) DOM to accept it? (since modifying the
generator (tex4ht) might prove to be difficult). It could be genera
3. Is there a way to have line numbers in the exceptions? Modifying the
source with writeln's to find out which tag exactly goes wrong is a bit
ugly.
Note that I'm already happy with pointers where to start. Anybody willing to
share private examples or documentation would be great too.
program saxattempt;
{$mode delphi}
Uses Sax_HTML,sysutils,classes,dom_html;
var d:TSearchRec;
sx : THTMLDocument;
Htmls: TStringList;
begin
htmls:=TStringList.create;
if findfirst('*.html',faanyfile,d)=0 then
begin
repeat
writeln(d.name);
sx:=THtmlDocument.create;
ReadHtmlFile(sx,d.name);
htmls.addobject(d.name,sx);
until findnext(d)<>0;
findclose(d);
end;
end.
More information about the fpc-devel
mailing list