[fpc-devel] Unicode support (yet again)

Thu Sep 15 10:15:50 CEST 2011

In our previous episode, Hans-Peter Diettrich said:
> > Lazarus was forced to make out of the identity of ANSIString and 
> > UTF8String seemingly forced by FPC. e.g.:
> > 
> > Old programs assuming local ANSI 8 bit code retrieved from LCL GUI 
> > components, compiled with the new version don't work (e.g. if doing 
> > myChar := myString[3]; )
> 
> How many bytes must a char have, when it shall allow to store any 
> (logical) character?

According to unicode n*codepoints.  A codepoint is now 20 or 21 bits, but
can be expanded in the more distant future

IIRC n is in the range of 5-8 or so, the maximum amount of codepoints that
can be combined to a printable character.

So if you want to do it up to spec, a character is +/- 256bit.

> Unicode users have no use for an char type, instead they have to use
> substrings for every logical character.  A Unicode BMP user could be happy
> with a 2-byte char, of course, at his own (low) risk.

Probably. But while a good point for a application builder based in the
West, it is IMHO not acceptable to cut corners in the unicode implementation
in system and development tools.