[fpc-devel] TRegistry and Unicode

Tue Feb 26 11:04:03 CET 2019

On Tue, 26 Feb 2019, Marco van de Voort wrote:

>
> Op 2/25/2019 om 9:27 PM schreef Michael Van Canneyt:
>>  I'm currently involved in some TRegistry bugs and regressions.
>>> Personally I don't use TRegistry in any of my programs.
>>> Also I mostly use Lazarus, so most most of the issues don't affect me.
>>>
>>> However I would like to share som observations and thoughts.
>>>
>>> TRegistry on Windows now (3.2+) uses Unicode API.
>>> String input parameters in the various methods get "promoted" to
>>> Unicode and then the API is called.
>>> Returned string values however are mostly encode in UTF8, by
>>> explicitely calling Utf8Encode(SomeUnicodeString).
>>> Is that (enforce UTF8 encoding) by design?
>>> (The Ansi to Unicode was done via UTF8Decode which is definitively
>>> wrong and is fixed by now.)
>>>
>>> On Lazarus, this no problem, since by default all strings are UTF8
>>> encoded, so all conversions are lossless.
>>
>> I think Lazarus users are the main TRegistry users, so I would keep 
>> current
>> behaviour for the public API. Where possible add overloads that use a
>> unicodestring, and let the UTF8 one call the unicode one.
>
> The current situation does not improve anything for Lazarus users that 
> set the default encoding to utf8 (aka utf8hack)
>
> If I look into e.g. registry.pp, the only use of utf8encode there is  
> like this:
>
> var  s : string;
>
>        u:unicodestring;
>
> s:=utf8encode(u);
>
> which, IF lazarus is used in the default utf8 mode is equivalent to
>
>
> s:=u;
>
>  So currently this utf8encode only frustrates the situation for people 
> that don't set the default codepage to utf8?
>
> If I'm wrong, what is the exact behaviour that you want to keep?

If I understood the OP correct, he wants to change the use of "string"
arguments in the public API to unicodestring.

That changes a lot.

Contrary to popular belief, the conversion will not automatically be
correct, and will produce errors.

(See e.g. https://bugs.freepascal.org/view.php?id=35113
for a similar situation where part of the error is that the lazarus
user must explicitly call Utf8Decode.)

So my proposal is to leave the public API as-is, using string, adding
unicode string overloads where possible/useful.

Internally, convert to whatever fits best.

if the internal routines are easier to maintain/understand if they use
unicode string throughout: refactor them to use unicode.

Michael.