[fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()
Ondrej Pokorny
lazarus at kluug.net
Thu Dec 26 21:12:08 CET 2019
On 26.12.2019 19:29, Michael Van Canneyt wrote:
> So no, I don't think these need to be changed/merged. What IMO can be
> discussed is
> which of these 2 need to be used as the default codepage in other
> code. It
> should then resolve the problems that appear, I think.
That would be possible as well. But still please reconsider it:
One reason: just from the convention - the default codepage to use
should be TEncoding.Default. That is intuitive.
Second reason: Now we have TEncoding.ANSI = TEncoding.Default. 2 equal
properties. And another FPC-only property TEncoding.SystemEncoding. That
means 3 properties for 2 values.
---
In Delphi TEncoding.ANSI and TEncoding.Default are actually different. See:
http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.Default
http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.ANSI
On Windows, they are equal but on POSIX they are different:
TEncoding.Default is UTF-8 but TEncoding.ANSI is the code page from
CFLocaleGetIdentifier.
Read the .NET docs about Encoding.Default:
https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?redirectedfrom=MSDN&view=netframework-4.8#System_Text_Encoding_Default
on .NET Framework it is ANSI but on .NET Core it is UTF-8 even on Windows.
With all the information from the docs, I am more and more convinced
that TEncoding.SystemEncoding is superfluous and TEncoding.Default
should take over its meaning: TEncoding.Default should reflect changes
in DefaultSystemCodePage. Whereas TEncoding.ANSI should stay a fixed
ANSI code page. With it there is no need for TEncoding.SystemEncoding.
With this change, in the current Lazarus UTF-8 solution,
TEncoding.Default will be UTF-8. In the future Unicode and
Delphi-compatible FPC/Lazarus, TEncoding.Default will get the Delphi
meaning (ANSI/UTF-8). IMO the concept is very sensible.
---
Btw. you have a bug in:
constructor TStringStream.CreateRaw(const AString: RawByteString);
var
CP: TSystemCodePage;
begin
CP:=StringCodePage(AString);
if (CP=CP_ACP) or (CP=TEncoding.Default.CodePage) then // this line
is wrong
begin
FEncoding:=TEncoding.Default;
FOwnsEncoding:=False;
end
else
In the code above, TEncoding.Default is used if CP=CP_ACP. That is
currently wrong - the bug perfectly reflects my suggestion for
TEncoding.Default change. Currently, CP_ACP corresponds with
DefaultSystemEncoding and thus with TEncoding.SystemEncoding and not
TEncoding.Default. TEncoding.Default corresponds with ANSI (that is not
CP_ACP as documented https://wiki.freepascal.org/FPC_Unicode_support ).
The code should be:
if (CP=CP_ACP) or (CP=TEncoding.SystemEncoding.CodePage) then
begin
FEncoding:=TEncoding.SystemEncoding;
FOwnsEncoding:=False;
end else
if (CP=TEncoding.Default.CodePage) then
begin
FEncoding:=TEncoding.Default;
FOwnsEncoding:=False;
end else
// ...
The current CreateRaw code is correct for my suggestion. As you can see
you intuitively expected the approach I am suggesting :)
Ondrej
More information about the fpc-devel
mailing list