[fpc-devel] XML Components

Mattias Gaertner nc-gaertnma at netcologne.de
Fri Nov 2 14:44:35 CET 2012


Sergei Gorelkin <sergei_gorelkin at mail.ru> hat am 2. November 2012 um 14:32
geschrieben:
> 02.11.2012 17:08, Michael Van Canneyt пишет:
> >
> >
> > On Fri, 2 Nov 2012, Andrew Brunner wrote:
> >
> >>
> >> I think it would be a good solution and even prove faster in controlled
> >> environments. Plus all
> >> data is stored as widestrings in the DOM.
> >>
> >> The first question I have is if there was such an option would the patch be
> >> accepted.
> >
> > I don't see how you can fix the problem. If the input is UTF8, and the
> > result must be converted to a
> > widestring for the DOM, then a conversion MUST take place, there is no way
> > to avoid it.
> > And a conversion means scanning the input byte for byte.
> >
> > In each case, the input must be scanned byte for byte anyway, to detect all
> > the tags. That's what
> > makes XML slow and unusable for large amount of data.
> >
> >> The next question is what is the problem with the uf8 routine that it left
> >> the offending byte
> >> sequence intact without converting the bytes in my sample data?
> >
> > Without error message, it is impossible to tell.
> >
> In this case, the issue is not encoding, but literal ESC (#27) code used in
> data. XML specification
> does not allow codepoints below 32, except TAB,CR and LF, to appear in data,
> both in literal and
> escaped forms.

Actually the specification only defines legal characters and that processors
must accept them.
It does not say what to do with the other characters.


> In other words, XML is wrong technology to work with binary data, unless it is
> encoded into textual
> form (Base64 or alike).

True.

Mattias



More information about the fpc-devel mailing list