[fpc-pascal] XMLWrite looses data

Mattias Gaertner nc-gaertnma at netcologne.de
Tue Mar 25 02:17:11 CET 2014


On Mon, 24 Mar 2014 20:12:50 +0000
Graeme Geldenhuys <mailinglists at geldenhuys.co.uk> wrote:

>[...]
> > The parser converts &#*; to Unicode characters when
> > reading. AFAIR some xsl parsers like xsltproc do the same.
> > If you want xslt to output '&#xa0;' you can use '&#xa0;'
> 
> Thanks for that info, it helped find the problem (though no solution
> yet). Tha character isn't actully a unicode character, it is simply a
> "no-break space" character at position $A0 in the ASCII chart.

Well, I see, that the term "character" is confusing here.
It is a Unicode codepoint. The &#xa0; is just a xml alias. For xml it
does not matter if you write it as code or encoded in UTF-8/UTF-16.


> Using hex
> value notation, instead of the more popular decimal notation when escaped.
> 
> =======[ charmap details ]============
> U+00A0 NO-BREAK SPACE
> UTF-8: 0xC2 0xA0
> UTF-16: 0x00A0
> 
> C octal escaped UTF-8: \302\240
> XML decimal entity:  
> =========================
> 
> But I now see what happened. When I enabled "show hidden characters"
> like spaces and tabs in my editor, I noticed that the no-break space
> character is still there, but in the resaved output file it is simply
> not escaped any more.

Yes. That's what I meant.

 
> How is the fcl-xml package supposed to handle escaped characters which
> will form part of the data the XSL will generate? Is fcl-xml supposed to
> write them back as escaped characters, or as an normal un-escaped character?

XML writers can choose. Both forms are valid xml of the given text.

 
> I tried using the decimal notation too:  
> And that produced the same result as the original.
> 
> Note:
> When we process a XML file with our XSL file, we want he resulting
> output to have a no-break character - we don't what to display the text
> '&#a0;' - which I think is what your suggestion with the & will produce.
> 
> To put this in context, in case my original XSL snippet wasn't clear.
> That snippet generates a date string in the format 'dd MMM yyyy' and the
> spaces between those elements are not normal spaces, but no-break
> spaces, so that whole text stays together (and wouldn't wordwrap in the
> middle).
> 
> 
> The current resaved XSL file still works, but not being able to
> physically see the no-break space characters could cause us problems
> months down the line when we re-edit those files. Hence the reason they
> were escaped (to make them clearly visible to the developer).

You can use comments.

The current XML writer only escapes '<', '>', '&', #0..#31.
Maybe you want to extend it with an option or hook to escape more
characters. For example all "control characters".

Mattias



More information about the fpc-pascal mailing list