[fpc-devel] Unicode in the RTL (my ideas)

Vincent Snijders vincent.snijders at gmail.com
Thu Aug 23 08:30:39 CEST 2012

2012/8/23 Hans-Peter Diettrich <DrDiettrich1 at aol.com>:
> Daniƫl Mantione schrieb:
>> Op Wed, 22 Aug 2012, schreef Felipe Monteiro de Carvalho:
>>> On Wed, Aug 22, 2012 at 9:36 PM, Martin Schreiber <mse00000 at gmail.com>
>>> wrote:
>>>> I am not talking about Unicode. I am talking about day by day
>>>> programming of
>>>> an average programmer where the live is easier with utf-16 than with
>>>> utf-8.
>>>> Unicode is not done by using pos() instead of character indexes.
>>>> I think everybody knows my opinion, I stop now.
>>> Please be clear in the terminogy. Don't say "live is easier with
>>> utf-16 than with utf-8" if you don't mean utf-16 as it is. Just say
>>> "live is easier with ucs-2 than with utf-8", then everything is clear
>>> that you are talking about ucs2 and not true utf-16.
>> That is nonsense.
>> * There are no whitespace characters beyond widechar range. This means you
>>   can write a routine to split a string into words without bothing about
>>   surrogate pairs and remain fully UTF-16 compliant.
> How is this different for UTF-8?

There are white space charaters beyond the char range, for example
U+00A0 no-break space.

So in UTF8 a white space character can be larger than 1 byte, in
UTF-16 they are all 2 bytes. That is the difference.


More information about the fpc-devel mailing list