[fpc-devel] fpdoc and unicode characters
Sergei Gorelkin
sergei_gorelkin at mail.ru
Thu Aug 14 14:24:32 CEST 2008
Graeme Geldenhuys wrote:
> On Thu, Aug 14, 2008 at 1:14 PM, Marco van de Voort <marcov at stack.nl> wrote:
>>> How does this argument fit with XML which also uses UTF-8 as the de
>>> facto standard encoding. And seeing that fpdoc uses XML for the
>>> documentation files, can I use the actual Unicode characters in my
>>> fpdoc documentation, or must I still stick with the?what now seems to
>>> be outdated?escaped method?
>> Depends. Is & a steering character in all of XML, or only the xhtml like
>> standards?
>
> I think only XHTML.
>
XML too. In XML, you *must* escape ampersand (U+0026) and less-than sign
(U+003C). Also greater-than sign (U+003E) must be escaped if it is
preceded by ']]' sequence. Additionally, in attribute values, quotes
(U+0022) must be escaped if they are used as value delimiters (other
option is to delimit values with apostrophes (U+0027)).
Here I mean the XML file, not the DOM tree. You may freely use the
mentioned characters in plaintext while manupulating DOM; the writer
will escape them on output.
> But what is fpdoc's xml files? Pure XML, XHTML or some custom/hybrid
> format? The layout of fpdoc's files seem XML, but the documentation
> content seems some hybrid HTML - hence the confusion with what is
> allowed!
>
XHTML is XML with defined 'vocabulary' (DTD). These formats have no
character-level differences.
> Anybody know the rules of strict XML files and Unicode? Can I use
> Unicode characters as data in XML nodes? I would imagine I may because
> most well-formed XML files specify UTF-8 as the encoding type.
>
> Also something I think has been resolved in recent versions, but in
> older 'makeskel' versions, it did not include the encoding in the
> generated .xml file. So what are we supposed to treat such files
> encoding as? Default to W3C standards and use assume UTF-8? LCL and
> fpGUI's fpdoc documentation (mostly) has no encoding specified in the
> .xml files. FPC's documentation specifies ISO8859-1 as the encoding
> type, though I found one file (dateutils.xml) it FPC docs that hasn't
> got an encoding (but my doc update is out of date).
>
W3C demands that XML file without encoding label should be treated as
UTF-8 (unless it has an UTF-16 BOM, in which case it should be treated
as UTF-16). Therefore UTF-8 labeling is optional.
In older times, makeskel used to write 'ISO8859-1' label, which btw is
invalid (IANA recognized names are ISO-8859-1 and ISO_8859-1). Later,
when the parser got more compliant, the labeling was removed. The parser
has a workaround to understand the ISO8859-1 labeling.
The XML writer always produces UTF-8 encoding and writes no label.
To summarize: Unicode can be used in fpdoc xml files. If the file has
ISO8859-1 encoding label, it should be removed or replaced with UTF-8
label. The output stages of fpdoc may or may not have problems with
Unicode - that requires additional research.
Sergei
More information about the fpc-devel
mailing list