[fpc-devel] XML Components

Jeppe Græsdal Johansen jjohan07 at student.aau.dk
Fri Nov 2 17:50:27 CET 2012


Den 02-11-2012 14:32, Sergei Gorelkin skrev:
> 02.11.2012 17:08, Michael Van Canneyt пишет:
>>
>>
>> On Fri, 2 Nov 2012, Andrew Brunner wrote:
>>
>>>
>>> I think it would be a good solution and even prove faster in 
>>> controlled environments.  Plus all
>>> data is stored as widestrings in the DOM.
>>>
>>> The first question I have is if there was such an option would the 
>>> patch be accepted.
>>
>> I don't see how you can fix the problem. If the input is UTF8, and 
>> the result must be converted to a
>> widestring for the DOM, then a conversion MUST take place, there is 
>> no way to avoid it.
>> And a conversion means scanning the input byte for byte.
>>
>> In each case, the input must be scanned byte for byte anyway, to 
>> detect all the tags. That's what
>> makes XML slow and unusable for large amount of data.
>>
>>> The next question is what is the problem with the uf8 routine that 
>>> it left the offending byte
>>> sequence intact without converting the bytes in my sample data?
>>
>> Without error message, it is impossible to tell.
>>
> In this case, the issue is not encoding, but literal ESC (#27) code 
> used in data. XML specification does not allow codepoints below 32, 
> except TAB,CR and LF, to appear in data, both in literal and escaped 
> forms.
> In other words, XML is wrong technology to work with binary data, 
> unless it is encoded into textual form (Base64 or alike).
>
> Regards,
> Sergei
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel
XML 1.1 allows anything down to #1, but the current parser doesn't seem 
to allow that. I guess that should solve most of the problems here.

Specifically, TXMLDecodingSource.SkipUntil doesn't allow #1..#31 if 
FXML11Rules is true, which I think it should.



More information about the fpc-devel mailing list