[fpc-pascal] UnicodeString and surrogate pairs
Michael Schnell
mschnell at lumino.de
Fri Apr 29 11:23:57 CEST 2016
On 04/29/2016 11:09 AM, Graeme Geldenhuys wrote:
>
> No, because UTF-8 doesn't use surrogate pairs.
Really ?
I understand that "surrogate pairs" is combining a printable character
(i.e on of the nearly 2^32 UTF thingies) with another of those to be
combined to a different printable thingy (/e.g. "A" plus "add two dots
above" to crate a "Ä").
Now a series of 32-bit UTF thingies can be compressed to as well a
series of UTF8 encoded bytes or as a series of UTF16 encoded words. Both
of which usually is much shorter (measured in bytes) than the
uncompressed UTF32 information.
So the UTF8 vs UTF16 issue is a lower layer of encoding.
-Michael
More information about the fpc-pascal
mailing list