[fpc-pascal] Unicode strings
Marco van de Voort
marcov at stack.nl
Fri Dec 31 13:17:56 CET 2010
In our previous episode, Juha Manninen said:
> > Widestring (refcounted 2-byte type) , it is the ansistring type (1-byte
> > type) that gets codepage support.
>
> UTF-16 needs codepages, too.
> I think only the 4-byte char type (is it UTF-32) would solve all encoding
> problems.
codepage<>encoding
> All characters of all languages fit into 2^32 space.
character<>codepoint.
Anyway surrogates etc is a different problem of processing true unicode
spec, and the bits that UTF32 solves are the lesser ones. (and at the
expense of speed and memory)
The document is a bit messy (since it was conceived before Delphi 2009 came
out, and later updates for that), but
http://www.stack.nl/~marcov/unicode.pdf
has some details.
More information about the fpc-pascal
mailing list