[fpc-devel] TRegistry and Unicode
Bart
bartjunk64 at gmail.com
Mon Feb 25 19:03:48 CET 2019
Hi,
I'm currently involved in some TRegistry bugs and regressions.
Personally I don't use TRegistry in any of my programs.
Also I mostly use Lazarus, so most most of the issues don't affect me.
However I would like to share som observations and thoughts.
TRegistry on Windows now (3.2+) uses Unicode API.
String input parameters in the various methods get "promoted" to
Unicode and then the API is called.
Returned string values however are mostly encode in UTF8, by
explicitely calling Utf8Encode(SomeUnicodeString).
Is that (enforce UTF8 encoding) by design?
(The Ansi to Unicode was done via UTF8Decode which is definitively
wrong and is fixed by now.)
On Lazarus, this no problem, since by default all strings are UTF8
encoded, so all conversions are lossless.
In a plain fpc program though on Windows, default encoding is the
current codepage (cp1252 in my case) and information will get lost
when you process the result further.
On non-Windows platforms all data in the registry is (supposed to be)
UTF8-encoded in a XML file.
Since the registry interfaces with a Unicode API (Windows) or UTF8 API
(all other platforms), would it maybe make sense to use UnicodeString
parameters throughout TRegistry? (UnicodeString because this is
primarily used on Windows, otherwise I'ld suggest UTF8String.)
Now all conversions from and to UnicodeString are hidden from the
programmer, so he/she cannot know that dataloss due to conversion may
occur.
Using UnicodeString parameters will make the caller aware (if he/she
uses AnsiStrings as parameters) that conversion will happen.
Pro's
- simpler and more consistent code in the Windows implementation of TRegistry
- awareness of conversion for the programmer
Con's
- people will complain about the warnings
- XMLReg implentation needs Utf8Encode/Utf8Decode (currently there is
no conversion there: even if system codepage is not UTF8, the XML file
claims it is, so this might be wrong as is)
- UnicodeStrings are slower (my guess is that acessing the API itself
is slower than the Pascal code in the registry methods)
- We do not have a Unicode TStringList (for
ReadStringList/WriteStringList methods)
Whilst I know that hardly any fpc devel uses TRegistry, without
getting your thoughts and opinions on this matter it makes no sense to
suggest patches implementing such a big change.
--
Bart
More information about the fpc-devel
mailing list