[fpc-pascal] XML DOM and HTML
Johannes Nohl
johannes.nohl at gmail.com
Sun Jun 8 16:31:56 CEST 2008
Dear list, dear Michael!
> There are multiple problems with HTML parsing: HTML is not a well-formed
> XML document, because
> - the tags are case insensitive (in XML they are case sensitive)
> - Not all tags must be closed.
> If the HTML is XHTML, then the DOM unit can be used to parse it.
But how do I retrieve more than the first part of the node's value?
If I read in:
<div>
asdf1
<span>qwer1</span>
asdf2
<img src="" />
asdf3
</div>
FindNode('dvi').NodeValue returns "asdf1". But not asdf2 and asdf3.
Isn't the example above valid XHTML?
Am I wrong?
More information about the fpc-pascal
mailing list