[fpc-devel] Unicode support (yet again)

Thu Sep 15 20:43:26 CEST 2011

On 15/09/2011 19:36, Hans-Peter Diettrich wrote:
> Martin schrieb:
>> On 15/09/2011 10:38, Michael Schnell wrote:
>>> On 09/15/2011 11:06 AM, Graeme Geldenhuys wrote:
>>>>
>>>> and to show you AGAIN how flawed your "direct index access to a
>>>> character" example is.
>>> It's not "my" intend to use it. I'll never use it as I do know that 
>>> it is bound to create problems. But it is what generations of pascal 
>>> programmers are trained to do. They all need to be re-trained. In 
>>> fact this is just "Syntax-Candy" (as here native Array-syntax is 
>>> used for a non-array type). So it could be removed or modified to 
>>> better support the expectation of the "generations of pascal 
>>> programmers" even in times of Unicode.
>>
>> Which imho makes utf8 far more preferable than utf16
>
> Just the contrary is true. UTF-16 extends the range of usable 
> charsets, with only changing the size of a char.
Thank you for ripping the above out of context. Now that it stands 
outside it's original context => you are right, now the contrary is true.
Of course, when it still was in it's context.....

>
>> in UTF8 the error is bound to happen far easier, which gives people a 
>> far better chance to catch it before release, even before creating to 
>> much code relying on the buggy implementation.
>> Certain errors will be made, the importance is not, to create an 
>> environment in which they can not be made. The importance is to 
>> create an environment in which they will be caught as early as possible.
>
> This is true only for formal errors, not for all the other problems 
> with foreign languages. The assumption, that Unicode will turn every 
> coder into an linguist, capable of dealing with all (human) languages, 
> is simply wrong.

This is true as far as the comparison between utf8 and utf16 goes. All 
other problems were ignored, because the point was ONLY to compare 
between the 2. Any othr alternative, or any change caused by either of 
them was not part of the statement.

         Utf16 does have surrogate pairs as well as chars not in the bmp

That means any developper who is serious about it's work must implement 
the handling of those. Which makes the effort comparable to utf8.

Yes indeed you are right, the chances that something goes wrong, if a 
developper doesn't deal with them are considerable lower in utf16 than 
in utf8 (but they do exist) => that was exactly my point.

If there is a likelyhood for a certain error to be made (by accident) 
then it is better if th chnaces are high that you notice the error => 
otherwise you ship your software with the error, and spend long time 
figuring out the bug reports from you users.

Of course, if a programmer INTENTIONALLY puts this error in his code, 
hoping to get away with it, then utf16 is the better choice.
I can and will however not see such bad practice as an argument for the 
case.

>
>> And besides that "generations of pascal programmers". Well the older 
>> ones are 9or at least should be) experienced => they should have no 
>> problem learning it. programming requires a developer to keep up to 
>> date. (I for example have long given up to directly access the video 
>> memory of a VGA adapter, despite my generation was trained to do so)
>>
>> As for "newbies" => they come with all kind of wrong experiences. A 
>> lot of them come from VBA, and yet we aren't discussing introduction 
>> of VBS like object instantiation? ( variable.Create instead of 
>> variable:=class.create).
>> so again, utf8 is good, they make the error, they see the error, they 
>> learn.
>
> VBA uses UTF-16, proofing your assumptions as wrong again.

please re-read my statement. Your statement seems to confirm not 
contradict mine