[fpc-devel] Unicode support (yet again)
Marco van de Voort
marcov at stack.nl
Thu Sep 15 10:15:50 CEST 2011
In our previous episode, Hans-Peter Diettrich said:
> > Lazarus was forced to make out of the identity of ANSIString and
> > UTF8String seemingly forced by FPC. e.g.:
> >
> > Old programs assuming local ANSI 8 bit code retrieved from LCL GUI
> > components, compiled with the new version don't work (e.g. if doing
> > myChar := myString[3]; )
>
> How many bytes must a char have, when it shall allow to store any
> (logical) character?
According to unicode n*codepoints. A codepoint is now 20 or 21 bits, but
can be expanded in the more distant future
IIRC n is in the range of 5-8 or so, the maximum amount of codepoints that
can be combined to a printable character.
So if you want to do it up to spec, a character is +/- 256bit.
> Unicode users have no use for an char type, instead they have to use
> substrings for every logical character. A Unicode BMP user could be happy
> with a 2-byte char, of course, at his own (low) risk.
Probably. But while a good point for a application builder based in the
West, it is IMHO not acceptable to cut corners in the unicode implementation
in system and development tools.
More information about the fpc-devel
mailing list