[fpc-devel] XML Components

Sergei Gorelkin sergei_gorelkin at mail.ru
Fri Nov 2 14:32:25 CET 2012


02.11.2012 17:08, Michael Van Canneyt пишет:
>
>
> On Fri, 2 Nov 2012, Andrew Brunner wrote:
>
>>
>> I think it would be a good solution and even prove faster in controlled environments.  Plus all
>> data is stored as widestrings in the DOM.
>>
>> The first question I have is if there was such an option would the patch be accepted.
>
> I don't see how you can fix the problem. If the input is UTF8, and the result must be converted to a
> widestring for the DOM, then a conversion MUST take place, there is no way to avoid it.
> And a conversion means scanning the input byte for byte.
>
> In each case, the input must be scanned byte for byte anyway, to detect all the tags. That's what
> makes XML slow and unusable for large amount of data.
>
>> The next question is what is the problem with the uf8 routine that it left the offending byte
>> sequence intact without converting the bytes in my sample data?
>
> Without error message, it is impossible to tell.
>
In this case, the issue is not encoding, but literal ESC (#27) code used in data. XML specification 
does not allow codepoints below 32, except TAB,CR and LF, to appear in data, both in literal and 
escaped forms.
In other words, XML is wrong technology to work with binary data, unless it is encoded into textual 
form (Base64 or alike).

Regards,
Sergei



More information about the fpc-devel mailing list