[fpc-devel] Unicode support (again)
Jonas Maebe
jonas.maebe at elis.ugent.be
Tue Nov 11 13:31:20 CET 2008
On 11 Nov 2008, at 13:15, Michael Schnell wrote:
> OTOH, in this special case, I don't see why the compiler should
> "normalize" "u¨" to "ü". If the software is supposed to be handling
> unicode, the unicode string "u¨" should be considered a perfectly
> legal two-code-point information consisting of a "u" (a single sub-
> code in UTF-8) and a double-dot (supposedly two subcodes in UTF-8).
Note that I was simplifying. It's not actually "u¨", but "u" followed
by the code point meaning "put ¨ on top of the preceding character".
In other words, there is (all in UTF-8)
a) "ü": "LATIN SMALL LETTER U WITH DIAERESIS", encoded as $C3 $BC
b) "ü": "LATIN SMALL LETTER U", encoded as $75, followed by
"COMBINING DIAERESIS", which is encoded as $CC $88
c) "u¨": "LATIN SMALL LETTER U", encoded as $75, followed by
"DIAERESIS", which is encoded as $C2 $A8
> If the user wants to handle this as a single "ü", he should write
> appropriate code for that. Any automation on that is dangerous.
The character combination actually literally means "ü" in both cases.
It's not a decision of a user whether or not it means "ü".
Jonas
More information about the fpc-devel
mailing list