[fpc-devel] XML Parser problems with C-Data and Character Encoding

Mattias Gaertner nc-gaertnma at netcologne.de
Thu Sep 27 11:19:21 CEST 2012


Andrew Brunner <atbrunner at aurawin.com> hat am 26. September 2012 um 21:21
geschrieben:
>[...]
> If c-data contains any character whose ordinal value is greater than 127
> the XML parser raises an exception.

Have you tried setting the right encoding in the xml?


>[...]
> Is there a ANY case where dropping c-data is OK just because a user
> hasn't entered it? I'm just curious... I can't seem to find any good
> reason as to why this happened other than to force UTF or some other
> encoding on everyone.
>
> The XML parser already has options for Validation. I'm hoping to get
> someone on the team to add a one line if Validation=true then check the
> values else just parse it already.
>
> http://mantis.freepascal.org/view.php?id=22990

Sergei's note is correct:

---snip---
This is unrelated to validation. XML parser always checks that its input data
conforms to the specified encoding.
The XML data should either be in utf8 encoding, or in utf16 with a BOM character
at the beginning, or it should be labeled with 'encoding' attribute
(additionally xmliconv unit must be added to uses clause if encoding is
different from iso8859-1).
---snap---


> Another thing would be to have an exception level event. There is an
> event there, but still the exception is raised the event is called -
> stopping the parsing and throwing a wrench in my streaming application
> logic.

Maybe you can explain what you are trying to do?


> In a world where XML is just about the primary choice for data
> transmission - I think we take speed as the ut-most priority and don't
> use such a pivotal technology as a tool to get what someone else wants.

AFAIK such statements have seldom a positive effect on volunteer projects.


> The bottom-line inference here is that WE ALL must encode ALL DATA
> before it can be streamed / parsed. That's bad news and EXPENSIVE.

Well, whatever you try, you must at least encode all < > and & characters.

Mattias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20120927/acf40fbc/attachment.html>


More information about the fpc-devel mailing list