[fpc-devel] Unicode support (yet again)

Martin lazarus at mfriebe.de
Fri Sep 16 01:13:05 CEST 2011


On 16/09/2011 00:03, cobines wrote:
> 2011/9/15 Hans-Peter Diettrich<DrDiettrich1 at aol.com>:
>> cobines schrieb:
>>> When doing:
>>> MyChar := MyString[1]
>>>
>>> appropriate function retrieves first unicode character, regardless of
>>> encoding.
>> This is just wrong :-(
>>
>> MyString[1] accesses the first element of the *physical* character array,
>> regardless of any encoding. Also Length returns the array size, not the
>> number of *logical* characters in it.
> Right. My point was if I come from Ansi knowing MyString[1] retrieves
> first character and know nothing about Unicode, I might still think it
> continues to retrieve first character in Unicode regardless of string
> encoding (RTL handles that). It is as you say wrong, therefore the
> need to adapt the code by developer if he uses such access, but people
> might don't know this. Having UTF-16 RTL might help them in a sense
> they they will never have to learn, until they deal with characters
> outside of the BMP.
>
Which means they will have to learn it immediately.

That is of course, unless the application does not have any user input 
at all. As soon as an text input from a user is processed, never mind 
what language the user speaks => this user may for some reason enter 
text, where string[x] will return half a char/surrogate.

If it was utf8, the developer would probably encounter the error fairly 
soon, and learn before creating tons of wring code.
with utf16, the developer may get away for many month, creating tons of 
code, that he needs to correct.

there are use cases where utf16 beats utf8 and vice versa.

but the argument "easier to learn" is a fake. Trying to hide a problem, 
and hoping it will not surface, has never been a good idea, why should 
it here?




More information about the fpc-devel mailing list