[fpc-devel] Re: enumerators
Jonas Maebe
jonas.maebe at elis.ugent.be
Wed Nov 17 13:20:59 CET 2010
On 17 Nov 2010, at 12:23, Michael Schnell wrote:
> Regarding that handling surrogate pairs needs tables while UTF/UCS
> handling can be done by simple algorithms and that (AFAIK) surrogate
> pairs are used only in certain environments (Mac and what else ?)
Surrogate pairs have nothing to do with Mac OS X. Surrogate pairs are
required when encoding any codepoint in UTF-16 whose UTF32 value is >=
$10000.
You are probably thinking of are decomposed characters (where e.g. "e"
and "¨" are encoded separately, instead of as "ë"). The RTL will never
do anything special about them, since they are two regular separate
codepoints. And then there's of course the fact that more than one
composed character can map to the same decomposed character, see e.g. http://unicode.org/reports/tr15/#Primary_Exclusion_List_Table
, and many other issues listed on that page.
In general: if you want to assume that a unicode string is in a
particular form, convert it to a particular canonical form and operate
on that (and keep in mind that you may destroy data in the process,
like with most code page conversions).
Jonas
More information about the fpc-devel
mailing list