[fpc-devel] Unicode support (yet again)

waldo kitty wkitty42 at windstream.net
Sat Sep 17 03:10:13 CEST 2011


On 9/15/2011 19:03, cobines wrote:
> 2011/9/15 Hans-Peter Diettrich<DrDiettrich1 at aol.com>:
>> cobines schrieb:
>>> When doing:
>>> MyChar := MyString[1]
>>>
>>> appropriate function retrieves first unicode character, regardless of
>>> encoding.
>>
>> This is just wrong :-(
>>
>> MyString[1] accesses the first element of the *physical* character array,
>> regardless of any encoding. Also Length returns the array size, not the
>> number of *logical* characters in it.
>
> Right. My point was if I come from Ansi knowing MyString[1] retrieves
> first character and know nothing about Unicode, I might still think it
> continues to retrieve first character in Unicode regardless of string
> encoding

+100000000000000000~

this is something that i'm having to deal with with 30+ years of pascal 
programing... i'm still trying to wrap my head around this GUI coding stuff... 
while i do have some similar experiences with other languages from way back 
(dBIII, dBIV and such that have/had forms) forms style coding is still alien to 
me... i'm used to simply clearing an 80x25 screen and then drawing my next 
screen... if i need to easily return to a previous screen, i might redraw it or 
i might restore it from a saved buffer and "blit" it back onto the screen...

i don't know the difference between thisstring and thatstring and there are 
times that this is one of the worst problems i face... my first hurdle was 
clearing the 255 character strings that i'm so used to dealing with... it used 
to be that i used a custom written ACSIIZ convertor routine but now it seems 
that these are in the run time libraries and i need only to choose the proper 
strings to convert between... even then, it can be quite the chore :?


> (RTL handles that). It is as you say wrong, therefore the
> need to adapt the code by developer if he uses such access, but people
> might don't know this. Having UTF-16 RTL might help them in a sense
> they they will never have to learn, until they deal with characters
> outside of the BMP.

moew old school stuff here... a BMP is a windows style graphic... what are you 
guys calling a BMP???

and agreeing (fully!) there really should be some sort of "hidden(?)" overrides 
taking place so that folks like myself don't really have to worry about this 
stuff... but then again, maybe? i dunno? i'm not sure, these days, after a 
year+, if i'm no the right track or not...

>>> Whether it's utf8, utf16, utf32 or any other future encoding the code
>>> should work the same.
>>
>> Very new functions are required for dealing with *logical* characters, in
>> every MBCS encoding.
>
> Hence the need to remove indexed access like MyString[1].

removing that is not a GoodThing<tm> is it? really? as i say above, it would 
seem to me to be best for the library to handle all of this stuff in the same 
way that borland handled similar things way back when... on the one hand, it 
would seem easy enough to handle with automatic overrides but then again, i may 
not be thinking of things in the same way as others used to working with this OO 
oriented method of coding...




More information about the fpc-devel mailing list