[fpc-devel] Unicode support (yet again)

cobines cobines at gmail.com
Fri Sep 16 07:33:23 CEST 2011


2011/9/16 Martin <lazarus at mfriebe.de>:
> On 16/09/2011 00:03, cobines wrote:
>>
>> 2011/9/15 Hans-Peter Diettrich<DrDiettrich1 at aol.com>:
>>>
>>> cobines schrieb:
>>>>
>>>> When doing:
>>>> MyChar := MyString[1]
>>>>
>>>> appropriate function retrieves first unicode character, regardless of
>>>> encoding.
>>>
>>> This is just wrong :-(
>>>
>>> MyString[1] accesses the first element of the *physical* character array,
>>> regardless of any encoding. Also Length returns the array size, not the
>>> number of *logical* characters in it.
>>
>> Right. My point was if I come from Ansi knowing MyString[1] retrieves
>> first character and know nothing about Unicode, I might still think it
>> continues to retrieve first character in Unicode regardless of string
>> encoding (RTL handles that). It is as you say wrong, therefore the
>> need to adapt the code by developer if he uses such access, but people
>> might don't know this. Having UTF-16 RTL might help them in a sense
>> they they will never have to learn, until they deal with characters
>> outside of the BMP.
>>
> Which means they will have to learn it immediately.
>
> That is of course, unless the application does not have any user input at
> all. As soon as an text input from a user is processed, never mind what
> language the user speaks => this user may for some reason enter text, where
> string[x] will return half a char/surrogate.
>
> If it was utf8, the developer would probably encounter the error fairly
> soon, and learn before creating tons of wring code.
> with utf16, the developer may get away for many month, creating tons of
> code, that he needs to correct.
>
> there are use cases where utf16 beats utf8 and vice versa.
>
> but the argument "easier to learn" is a fake. Trying to hide a problem, and
> hoping it will not surface, has never been a good idea, why should it here?

I agree with you. Yet I don't think anyone is trying to hide a
problem. It is the responsibility of each developer to learn before
they use Unicode, instead of forcing FPC to have such encoding so that
developers will be forced to learn sooner rather than later. From what
I understand that argument is not "easier to learn" but "easier to
transition to from Ansi if you don't care to learn". It's not an
argument for me, but I'm sure there are other reasons as well. And
since no encoding is easier to learn then it shouldn't matter which
one is chosen.

--
cobines



More information about the fpc-devel mailing list