[fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()

Ondrej Pokorny lazarus at kluug.net
Thu Dec 26 21:12:08 CET 2019


On 26.12.2019 19:29, Michael Van Canneyt wrote:
> So no, I don't think these need to be changed/merged. What IMO can be 
> discussed is
> which of these 2 need to be used as the default codepage in other 
> code. It
> should then resolve the problems that appear, I think.

That would be possible as well. But still please reconsider it:
One reason: just from the convention - the default codepage to use 
should be TEncoding.Default. That is intuitive.
Second reason: Now we have TEncoding.ANSI = TEncoding.Default. 2 equal 
properties. And another FPC-only property TEncoding.SystemEncoding. That 
means 3 properties for 2 values.
---

In Delphi TEncoding.ANSI and TEncoding.Default are actually different. See:
http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.Default
http://docwiki.embarcadero.com/Libraries/Rio/en/System.SysUtils.TEncoding.ANSI

On Windows, they are equal but on POSIX they are different: 
TEncoding.Default is UTF-8 but TEncoding.ANSI is the code page from 
CFLocaleGetIdentifier.

Read the .NET docs about Encoding.Default:
https://docs.microsoft.com/en-us/dotnet/api/system.text.encoding.default?redirectedfrom=MSDN&view=netframework-4.8#System_Text_Encoding_Default
on .NET Framework it is ANSI but on .NET Core it is UTF-8 even on Windows.

With all the information from the docs, I am more and more convinced 
that TEncoding.SystemEncoding is superfluous and TEncoding.Default 
should take over its meaning: TEncoding.Default should reflect changes 
in DefaultSystemCodePage. Whereas TEncoding.ANSI should stay a fixed 
ANSI code page. With it there is no need for TEncoding.SystemEncoding.

With this change, in the current Lazarus UTF-8 solution, 
TEncoding.Default will be UTF-8. In the future Unicode and 
Delphi-compatible FPC/Lazarus, TEncoding.Default will get the Delphi 
meaning (ANSI/UTF-8). IMO the concept is very sensible.

---

Btw. you have a bug in:

constructor TStringStream.CreateRaw(const AString: RawByteString);
var
   CP: TSystemCodePage;
begin
   CP:=StringCodePage(AString);
   if (CP=CP_ACP) or (CP=TEncoding.Default.CodePage) then // this line 
is wrong
     begin
     FEncoding:=TEncoding.Default;
     FOwnsEncoding:=False;
     end
   else

In the code above, TEncoding.Default is used if CP=CP_ACP. That is 
currently wrong - the bug perfectly reflects my suggestion for 
TEncoding.Default change. Currently, CP_ACP corresponds with 
DefaultSystemEncoding and thus with TEncoding.SystemEncoding and not 
TEncoding.Default. TEncoding.Default corresponds with ANSI (that is not 
CP_ACP as documented https://wiki.freepascal.org/FPC_Unicode_support ).

The code should be:
if (CP=CP_ACP) or (CP=TEncoding.SystemEncoding.CodePage) then
begin
   FEncoding:=TEncoding.SystemEncoding;
   FOwnsEncoding:=False;
end else
if (CP=TEncoding.Default.CodePage) then
begin
   FEncoding:=TEncoding.Default;
   FOwnsEncoding:=False;
end else
// ...

The current CreateRaw code is correct for my suggestion. As you can see 
you intuitively expected the approach I am suggesting :)

Ondrej



More information about the fpc-devel mailing list