[fpc-devel] TEncoding.Default and default encoding for TStrings.LoadFrom*()

Michael Van Canneyt michael at freepascal.org
Fri Dec 27 10:40:38 CET 2019

On Fri, 27 Dec 2019, Ondrej Pokorny wrote:

> On 27.12.2019 0:19, Michael Van Canneyt wrote:
>> On Thu, 26 Dec 2019, Ondrej Pokorny wrote:
>>> On 26.12.2019 19:29, Michael Van Canneyt wrote:
>>>> So no, I don't think these need to be changed/merged. What IMO can 
>>>> be discussed is
>>>> which of these 2 need to be used as the default codepage in other 
>>>> code. It
>>>> should then resolve the problems that appear, I think.
>>> That would be possible as well. But still please reconsider it:
>>> One reason: just from the convention - the default codepage to use 
>>> should be TEncoding.Default. That is intuitive.
>>> Second reason: Now we have TEncoding.ANSI = TEncoding.Default. 2 
>>> equal properties. And another FPC-only property 
>>> TEncoding.SystemEncoding. That means 3 properties for 2 values.
>> As far as I know, TEncoding.ANSI = CP_ACP.
> This is indeed not correct. See 
> https://wiki.freepascal.org/FPC_Unicode_support :
> CP_ACP: this value represents the currently set "default system code 
> page". See #Code page settings for more information.

I meant the windows meaning of CP_ACP, not what the RTL makes of it. 
I think the use of CP_ACP in the RTL is quite dubious.

Using CP_SYSTEM or so would have been better. No doubt again a Delphi
compatibility naming :(

> TMBCSEncoding.Create(widestringmanager.GetStandardCodePageProc(scpAnsi))

This corresponds to what I meant.

> and
>   TStandardCodePageEnum = (
>     scpAnsi,                 // system Ansi code page (GetACP on windows)
> - as you can see the CP_ACP value does not correspond with the GetACP 
> WinAPI call result. (But this is wanted as documented in 
> https://wiki.freepascal.org/FPC_Unicode_support ).
>> Why should this equal TEncoding.Default ? 
> sysencoding.inc:
> class function TEncoding.GetDefault: TEncoding;
> begin
>   Result := GetANSI;
> end;

I know it is currently so, the question is : why ? :)

Maybe Default is better SystemEncoding, see below.

>> I think  TEncoding.Default  = CP_UTF8 on linux ?
> Yes, in FPC this is correct. Also TEncoding.ANSI =CP_UTF8 on linux in FPC.

Not necessarily, if I read the wiki page correctly.

>> The main problem I see is that there is the system (OS) encoding, and the
>> encoding specified by DefaultSystemCodePage.
>> These do not necessarily agree. So it makes sense to have 2 
>> TEncodings: one
>> for the system encoding, one for the DefaultSystemCodePage variable. They
>> will not be equal.
>> If they were, then the DefaultSystemCodePage variable makes no sense 
>> whatever.
> Yes, indeed. Therefore I suggested
> * TEncoding.Default for the DefaultSystemCodePage variable
> and
> * TEncoding.ANSI for the system encoding.
> Currently we have
> * TEncoding.SystemEncoding for the DefaultSystemCodePage variable
> and
> * both TEncoding.ANSI and TEncoding.Default for the system encoding. 
> (TEncoding.ANSI and TEncoding.Default are equal in FPC.)

In that case,  why not simply change:

  class function TEncoding.GetDefault: TEncoding;
    Result := GetSystemEncoding;

Nothing need be removed. I consider SystemEncoding a better name than Default,
and the latter should only be kept for Delphi compatibility. IMHO it would be
better to avoid Default, in fact I would change references to Default to
SystemEncoding for clarity. Default is completely non-descriptive.

If I understand your reasoning correct, that should solve the problems you
report ?


