[fpc-pascal] Unicode strings

Marco van de Voort marcov at stack.nl
Fri Dec 31 13:17:56 CET 2010


In our previous episode, Juha Manninen said:
> > Widestring (refcounted 2-byte type) , it is the ansistring type (1-byte
> > type) that gets codepage support.
> 
> UTF-16 needs codepages, too.
> I think only the 4-byte char type (is it UTF-32) would solve all encoding 
> problems. 

codepage<>encoding

> All characters of all languages fit into 2^32 space.

character<>codepoint.

Anyway surrogates etc is a different problem of processing true unicode
spec, and the bits that UTF32 solves are the lesser ones. (and at the
expense of speed and memory)

The document is a bit messy (since it was conceived before Delphi 2009 came
out, and later updates for that), but 

http://www.stack.nl/~marcov/unicode.pdf

has some details.



More information about the fpc-pascal mailing list