[fpc-devel] TRegistry and Unicode

Yuriy Sydorov jura at cp-lab.com
Thu Mar 7 18:30:31 CET 2019


On 07.03.2019 18:38, Bart wrote:
> On Wed, Mar 6, 2019 at 10:09 PM Yuriy Sydorov <jura at cp-lab.com> wrote:
> 
>> If you declare a function result as utf8string instead of string (ansistring) then automatic conversion will be
>> performed when you assign the result of the function to a variable of type string (ansistring). You will gen a classic
>> 1-byte per character string if your current encoding is 1-byte encoding.
>> I mentioned this earlier.
> 
> I know that, but you do not need to assign the functionresult to
> another string to investigate it.
> Stupid example:
> program test;
> 
> function x: utf8string;
> var
>    u: unicodestring;
> begin
>    setlength(u,3);
>    word(u[1]) := $E4; //my editor is UTF8 so therefore this workaround
> instead of u := 'äëï';
>    word(u[2]) := $EB;
>    word(u[3]) := $EF;
>    result := utf8encode(u); //äëï but now Utf8Encoded
> end;
> 
> var
>    u8: utf8string;
> begin
>    u8 := x;
>    if byte(u8[1]) = $E4 then writeln('OK') else writeln('Fail');
> end.
> 
> It prints Fail, where it would have printed OK if x would have returned string.
> 
> This a corner case, but it definitely is a regression nevertheless.

Of course if "u8" is utf8string, then then first char will be encoded as a 2-byte pair. But if you change "u8" to be 
just "string" or "ansistring", then the first byte would contain "ä" if the current ansi code page supports it (eg cp1252).
It is perfectly backward compatible.

Yuriy.



More information about the fpc-devel mailing list