[fpc-devel] Unicode in the RTL (my ideas)

Thu Aug 23 08:30:39 CEST 2012

2012/8/23 Hans-Peter Diettrich <DrDiettrich1 at aol.com>:
> Daniël Mantione schrieb:
>
>> Op Wed, 22 Aug 2012, schreef Felipe Monteiro de Carvalho:
>>
>>> On Wed, Aug 22, 2012 at 9:36 PM, Martin Schreiber <mse00000 at gmail.com>
>>> wrote:
>>>>
>>>> I am not talking about Unicode. I am talking about day by day
>>>> programming of
>>>> an average programmer where the live is easier with utf-16 than with
>>>> utf-8.
>>>> Unicode is not done by using pos() instead of character indexes.
>>>> I think everybody knows my opinion, I stop now.
>>>
>>>
>>> Please be clear in the terminogy. Don't say "live is easier with
>>> utf-16 than with utf-8" if you don't mean utf-16 as it is. Just say
>>> "live is easier with ucs-2 than with utf-8", then everything is clear
>>> that you are talking about ucs2 and not true utf-16.
>>
>>
>> That is nonsense.
>>
>> * There are no whitespace characters beyond widechar range. This means you
>>   can write a routine to split a string into words without bothing about
>>   surrogate pairs and remain fully UTF-16 compliant.
>
>
> How is this different for UTF-8?
>

There are white space charaters beyond the char range, for example
U+00A0 no-break space.

So in UTF8 a white space character can be larger than 1 byte, in
UTF-16 they are all 2 bytes. That is the difference.

Vincent