[fpc-pascal] UnicodeString and surrogate pairs

Sven Barth pascaldragon at googlemail.com
Sat Apr 30 08:31:33 CEST 2016


Am 30.04.2016 08:24 schrieb "Michael Schnell" <mschnell at lumino.de>:
>
> On 04/29/2016 11:09 AM, Graeme Geldenhuys wrote:
>>
>>
>> No, because UTF-8 doesn't use surrogate pairs.
>
> Really ?
>
> I understand that "surrogate pairs" is combining a printable character
(i.e on of the nearly 2^32 UTF thingies) with another of those to be
combined to a different printable thingy (/e.g. "A" plus "add two dots
above" to crate a "Ä").

No, that's a different thingie. Surrogate pairs are used in UTF-16 to
represent characters which would be > $FFFF. What you are talking about is
- I think - decomposition (don't know the exact name) and is a whole more
complex topic cause you need to know which characters can be combined.
Surrogate pairs on the other hand are specific byte ranges that act as
first and second part of the character.

Regards,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20160430/5ad2f99c/attachment.html>


More information about the fpc-pascal mailing list