[fpc-pascal] Re: Widestrings length and character iteration
Christos Chryssochoidis
c.chryssochoidis at gmail.com
Wed May 9 00:23:46 CEST 2007
Daniël Mantione wrote:
>
> Op Mon, 7 May 2007, schreef Christos Chryssochoidis:
>
>> Daniël Mantione wrote:
>>> Not possible, a widestring is UCS-2/UTF-16.
>> I defined a widestring with 7 characters (code points), and the
length()
>> function returned the value 15. Of the 7 code points of that
widestring only
>> one of them was greater than $07FF (the maximum code point which can be
>> encoded in 2 bytes under UTF-8). When I changed that character with
another
>> one with code not greater than $07FF, length() returned value 14...
I also
>> printed the byte values of one of the widestring's widechars, and
the values
>> printed indicated UTF-8 encoding.
>
> Yes, the program output is utf-8 on OS-X, because this is the native
> encoding for OS-X. However, widestrings are not utf-8. Can you show your
> code?
>
> Daniël
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> fpc-pascal maillist -
fpc-pascal-PD4FTy7X32k2wBtHl531yWD2FQJk+8+b at public.gmane.org
> http://lists.freepascal.org/mailman/listinfo/fpc-pascal
OK, I figured out what happened. The source file was saved in UTF-8
encoding, but I hadn't put in my source file the compiler directive
{$CODEPAGE UTF8}. After including this directive in my code almost
everything worked fine: length() was returning the right number of
unicode characters, and subscripting the widestring returned the right
character. But as the widechar and widestring encoding is, as you said,
UTF-16, while my Mac OS X console uses UTF-8 encoding, for the output
results to be displayed correctly I had to wrap the individual widechars
or the whole widestring with the function utf8encode(), prior to output
them with write()...
Thanks for your help,
Christos
More information about the fpc-pascal
mailing list