[fpc-devel] Unicode support (again)
plugwash at p10link.net
Tue Nov 11 12:51:17 CET 2008
Michael Schnell wrote:
>> It will at best be "friendly old school behaviour which works most of
>> the time, but which fails as soon as the strings are not completely
>> normalised because then you can have decomposed characters and
>> whatnot" (which in turn easily leads to security holes due to
>> incomplete checks, hard to reproduce bugs and "write once, debug
>> everywhere"-style behaviour).
> Sorry, I don't understand. What not normalized behavior needs to be
> taken into account ?
Remember that an individual code point does not nessacerally represent
what a user would consider a character. Indeed one character may be
representable in more than one way (either as a precomposed character or
a sequence of base character and combining diacritic). And even if we
ignore combining diacritics the number of console positions a string
takes is not nessacerally equal to the code point either since many CJK
characters take two console positions.
Given theese facts code point counts and indexes are not much more
usefull than code unit indexes and counts.
And if you need something better than either code point count or code
unit count then you have little choice but to pull in an external
library. Pulling in an external library with a relatively unstable
interface is not something the compiler or RTL should be doing IMO.
More information about the fpc-devel