[fpc-pascal] String literals and code page of .pas source file

Michael Van Canneyt michael at freepascal.org
Mon Sep 14 13:39:36 CEST 2020



On Mon, 14 Sep 2020, Tomas Hajny via fpc-pascal wrote:

> On 2020-09-12 23:03, Tomas Hajny wrote:
>> On 2020-09-12 18:51, Jonas Maebe via fpc-pascal wrote:
>>> On 12/09/2020 18:44, Sven Barth via fpc-pascal wrote:
>>>> Jonas Maebe via fpc-pascal <fpc-pascal at lists.freepascal.org
>>>> <mailto:fpc-pascal at lists.freepascal.org>> schrieb am Sa., 12. Sep. 
>>>> 2020,
>>>> 17:47:
>  .
>  .
>> While performing some tests, I came across other things which are not
>> very nice either (those are specific to the Win32/Win64 target due to
>> the difference between process codepage and console codepage). Let's
>> take the following test program:
>> 
>> {$codepage cp1250}
>> {$IFDEF USECRT}
>> uses
>>  Crt;
>> {$ENDIF USECRT}
>> const
>>  S = 'žluťoučký kůň';
>> var
>>  T: string;
>> begin
>>  T := S;
>> {$IFDEF USECRT}
>>  Write ('Using Crt');
>> {$ELSE USECRT}
>>  Write ('Not using Crt');
>> {$ENDIF USECRT}
>>  WriteLn (S);
>>  WriteLn (T);
>>  WriteLn (DefaultSystemCodepage);
>>  WriteLn (TextRec (Output).Codepage);
>> end.
>> 
>> Let's compile it _without_ -dUSECRT and _with_ -Mfpc first. The
>> original poster uses the same default codepage as me. If I start
>> cmd.exe and run "chcp" without parameters, it shows codepage 852 as
>> the console codepage. Now run the test program. It shows that the
>> codepage for the default file handle Output matches the console
>> codepage (as it should), but the string output is incorrect for both
>> WriteLn(S) and WriteLn(T) lines. If you perform "chcp 1250" and run
>> the program again, the codepages match and the string output is
>> correct.
>> 
>> If you compile the same program with -dUSECRT, the output is correct
>> for both WriteLn calls regardless from the console codepage setting
>> (i.e. both for "chcp 852" and for "chcp 1250" - and also for "chcp
>> 65001").
>> 
>> If you compile the same program with -dUSECRT and -Mdelphi together
>> and run the program in a console window set to codepage 852 (i.e. the
>> default setting here), the first WriteLn call is wrong, whereas the
>> second gives a correct result (due to the fact that T becomes an
>> ansistring in mode Delphi and dynamic translation is thus performed as
>> opposed to the case when a shortstring or an untyped constant are
>> passed).
>
> Questions resulting from my test above and the observed inconsistencies:
>
> 1) Wouldn't it be better if shortstrings are treated the same way as 
> ansistrings with CP_ACP? This would make a difference only during 
> assignments to strings with different codepages. Since strings with 
> different codepages didn't exist in the past (and in the current 
> situation they are simply broken), this change shouldn't break 
> compatibility hopefully.

No idea what to advise here. 
I would think shortstring is ASCII or OEM codepage, not even ANSI :/

> 2) Shouldn't WriteLn with a untyped string constant parameter result in 
> calling some Unicode based version of WriteLn rather than the 
> shortstring overloaded version (since the constant is stored in UTF-16 
> internally)?

What is the codepage of a constant string ? Should this not be used ?

>
> 3) Shouldn't we try to make the output of Write with and without unit 
> Crt compatible to each other? If we do so, what should be the encoding 
> used for output redirected to a file - should it use 
> DefaultSystemCodePage, or scpConsoleCP, or what (remember that this 
> question doesn't exist with unit Crt, because unit Crt isn't compatible 
> with redirection).

I think this last one are in fact 3 questions:
- What to do if output is redirected externally ? (IMHO nothing)
- What to do if output is redirected internally ? (IMHO, the codepage should be kept)
- Whether and how to extend Crt so it works with unicode.
   (Since Crt is legacy, I would not touch it; You'd need to rewrite it as unicode.)

Michael.


More information about the fpc-pascal mailing list