[fpc-devel] fcl-xml

Mon Mar 23 13:07:43 CET 2009

(maillist maintainer/jonas: I wrote a similar message from a non-subscribed
email addr. It can be discarded, sorry)

I needed a html parser, and am not in a hurry, so I decided to check FPC's
own first, in the hope that I can at least make some documentation in the
wiki /examples during the experience.

The first project is simple, see program below, executed on FPC's html
documentation.  I noticed that it failed like this:

An unhandled exception occurred at $004284EC :
EDOMError : EDOMError in DOMDocument.CreateElement hr/0
  $004284EC
  $00411A86  THTMLTODOMCONVERTER__READERSTARTELEMENT,  line 500 of
  src/sax_html.pp
  $0042648A  TSAXREADER__DOSTARTELEMENT,  line 738 of src/sax.pp
  $004110DC  THTMLREADER__ENTERNEWSCANNERCONTEXT,  line 391 of
  src/sax_html.pp
  $00410C80  THTMLREADER__PARSE,  line 358 of src/sax_html.pp
  $0042612C  TSAXREADER__PARSESTREAM,  line 647 of src/sax.pp
  $00411F3D  READHTMLFILE,  line 609 of src/sax_html.pp
  $00411E91  READHTMLFILE,  line 593 of src/sax_html.pp
  $004015DE  main,  line 21 of saxattempt.dpr

Some debugging seems that it fails on <hr/>, doctype of the doc in question
is

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

Some questions for the more xmlable:
1. is this correct? I think <hr/> is more xml notation than html notation?
2. can I somehow convince (override) DOM to accept it? (since modifying the
generator (tex4ht) might prove to be difficult). It could be genera
3. Is there a way to have line numbers in the exceptions? Modifying the
source with writeln's to find out which tag exactly goes wrong is a bit
ugly.

Note that I'm already happy with pointers where to start. Anybody willing to
share private examples or documentation would be great too.

program saxattempt;

{$mode delphi}

Uses Sax_HTML,sysutils,classes,dom_html;

var d:TSearchRec;
    sx : THTMLDocument;
    Htmls: TStringList;
begin
  htmls:=TStringList.create;
  if findfirst('*.html',faanyfile,d)=0 then
    begin
      repeat
        writeln(d.name);
        sx:=THtmlDocument.create;
        ReadHtmlFile(sx,d.name);
        htmls.addobject(d.name,sx);
      until findnext(d)<>0;
      findclose(d);
    end;
end.