[fpc-devel] assign constant text to widestring

Martin Friebe fpc at mfriebe.de
Wed Oct 22 17:01:56 CEST 2008



Michael Schnell wrote:
> Hi Experts,
[.....]
> When I want to simply assign a constant text "ö2" to a WideString I 
> would think that I just write s := 'ö2'; . But I found that this does 
> not work, but that it creates a WideString of length 3 that contains 
> the three 8-Bit subcodes of the utf8-coded string "ö2", zero-extended 
> to 16 Bits, each in one WideChar element. For me this is very 
> surprising and incompatible to the same code (s := 'ö2'; ) used in a 
> Turbo-Delphi program.
>
> Obviously - other than Turbo-Delphi that uses ANSIString here - a 
> constant string gets UTF8String as it's intermediate type. This might 
> be a useful definition, but if that is done this way why does an 
> assignment WideString := UTF8String inot implicitly call UTF8Decode as 
> a type conversion ? In my example it calls  fpc_ansistr_to_widestr 
> instead, just as if the UTF8String would be an ANSIString.
>
I am not an expert, but here is what I believe to know:

This is the result of 2 (hidden) "features":

AFAIK the compiler reads the source as non-utf8 (latin or some 8 bit 
encoding). This leads to other things too, like identifiers cannot 
contain utf8.

The String within the quotes is a byte sequence to the compiler. And the 
compiler does not know it to be utf8. From your description I take it 
the compiler does translate those 3 "8bit chars" into some 16bit chars 
(correctness of this translation based on the 8bit source encoding is 
another question)

Lazarus uses UTF8 for everything, it will save your string as utf8. If 
Your string was kept as ansistring, the compiler would treat it as 
bytes, and pass it through, so any code wanting to see the utf8 would be 
fine.

You can try and tell Lazarus to save you file as latin1. As long as all 
you strings fit into latin1, this may work; IF and only if the compiler 
will translate the latin1 into correct Widechars.

It will not work for anything not in utf8. AFAIK Lazarus currently 
doesn't save in ucs2 (or any 16 bit encoding). But even if Lazarus did, 
since the compiler wants 8bit encoding, your whole source would be broken.

Not much help, I know. Maybe some one else does have more ideas / knowledge.

> Is there some compiler setting to change this ?
>
> -Michael
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> http://lists.freepascal.org/mailman/listinfo/fpc-devel



More information about the fpc-devel mailing list