[fpc-devel] assign constant text to widestring
Michael Schnell
mschnell at lumino.de
Wed Oct 22 15:36:22 CEST 2008
Hi Experts,
There has been a long winding discussion on this in the "German Lazarus
Forum" and I have been very dissatisfied with the result.
Maybe this already has been discussed in one of the "Unicode" threads
here, but I did not follow all of them down to the latest twig and leaf.
So I start a new thread hoping for a more comprehensive result.
When using UTF8String I found that if s is an UTF8String containing
"ö2", length(s) is 3 and s[3] is "2". Obviously, UTF8Strings content is
counted regarding the 8 bit sub-codes and not the "visible" characters.
While I don't like this "un-String-like" behavior at all, I am aware
that this is by design to guarantee a decent speed.
But happily we don't need to use UTF8Strings to handle Unicode, as we do
have WideStrings, which suffer from this queer behavior only when we try
to store extremely strange characters (Unicode > $FFFF) using "surrogate
pairs". I feel that I am very unlikely to ever need to do this.
So I did some tests with WideStrings and found strange things with them,
too. While some of them are Lazarus issues, one quite obviously is
introduced by the compiler.
When I want to simply assign a constant text "ö2" to a WideString I
would think that I just write s := 'ö2'; . But I found that this does
not work, but that it creates a WideString of length 3 that contains the
three 8-Bit subcodes of the utf8-coded string "ö2", zero-extended to 16
Bits, each in one WideChar element. For me this is very surprising and
incompatible to the same code (s := 'ö2'; ) used in a Turbo-Delphi program.
Obviously - other than Turbo-Delphi that uses ANSIString here - a
constant string gets UTF8String as it's intermediate type. This might be
a useful definition, but if that is done this way why does an assignment
WideString := UTF8String inot implicitly call UTF8Decode as a type
conversion ? In my example it calls fpc_ansistr_to_widestr instead,
just as if the UTF8String would be an ANSIString.
Is there some compiler setting to change this ?
-Michael
More information about the fpc-devel
mailing list