[fpc-devel] Unicode support (again)

Tue Nov 11 13:31:20 CET 2008

On 11 Nov 2008, at 13:15, Michael Schnell wrote:

> OTOH, in this special case, I don't see why the compiler should  
> "normalize" "u¨" to "ü". If the software is supposed to be handling  
> unicode, the unicode string "u¨" should be considered a perfectly  
> legal two-code-point information consisting of a "u" (a single sub- 
> code in UTF-8) and a double-dot (supposedly two subcodes in UTF-8).

Note that I was simplifying. It's not actually "u¨", but "u" followed  
by the code point meaning "put ¨ on top of the preceding character".  
In other words, there is (all in UTF-8)

a) "ü": "LATIN SMALL LETTER U WITH DIAERESIS", encoded as $C3 $BC
b) "ü": "LATIN SMALL LETTER U", encoded as $75, followed by  
"COMBINING DIAERESIS", which is encoded as $CC $88
c) "u¨": "LATIN SMALL LETTER U", encoded as $75, followed by  
"DIAERESIS", which is encoded as $C2 $A8

> If the user wants to handle this as a single "ü", he should write  
> appropriate code for that. Any automation on that is dangerous.

The character combination actually literally means "ü" in both cases.  
It's not a decision of a user whether or not it means "ü".

Jonas