[fpc-pascal] String literals and code page of .pas source file
Tomas Hajny
XHajT03 at hajny.biz
Mon Sep 14 00:28:11 CEST 2020
On 2020-09-12 23:03, Tomas Hajny wrote:
> On 2020-09-12 18:51, Jonas Maebe via fpc-pascal wrote:
>> On 12/09/2020 18:44, Sven Barth via fpc-pascal wrote:
>>> Jonas Maebe via fpc-pascal <fpc-pascal at lists.freepascal.org
>>> <mailto:fpc-pascal at lists.freepascal.org>> schrieb am Sa., 12. Sep.
>>> 2020,
>>> 17:47:
.
.
> While performing some tests, I came across other things which are not
> very nice either (those are specific to the Win32/Win64 target due to
> the difference between process codepage and console codepage). Let's
> take the following test program:
>
> {$codepage cp1250}
> {$IFDEF USECRT}
> uses
> Crt;
> {$ENDIF USECRT}
> const
> S = 'žluťoučký kůň';
> var
> T: string;
> begin
> T := S;
> {$IFDEF USECRT}
> Write ('Using Crt');
> {$ELSE USECRT}
> Write ('Not using Crt');
> {$ENDIF USECRT}
> WriteLn (S);
> WriteLn (T);
> WriteLn (DefaultSystemCodepage);
> WriteLn (TextRec (Output).Codepage);
> end.
>
> Let's compile it _without_ -dUSECRT and _with_ -Mfpc first. The
> original poster uses the same default codepage as me. If I start
> cmd.exe and run "chcp" without parameters, it shows codepage 852 as
> the console codepage. Now run the test program. It shows that the
> codepage for the default file handle Output matches the console
> codepage (as it should), but the string output is incorrect for both
> WriteLn(S) and WriteLn(T) lines. If you perform "chcp 1250" and run
> the program again, the codepages match and the string output is
> correct.
>
> If you compile the same program with -dUSECRT, the output is correct
> for both WriteLn calls regardless from the console codepage setting
> (i.e. both for "chcp 852" and for "chcp 1250" - and also for "chcp
> 65001").
>
> If you compile the same program with -dUSECRT and -Mdelphi together
> and run the program in a console window set to codepage 852 (i.e. the
> default setting here), the first WriteLn call is wrong, whereas the
> second gives a correct result (due to the fact that T becomes an
> ansistring in mode Delphi and dynamic translation is thus performed as
> opposed to the case when a shortstring or an untyped constant are
> passed).
Questions resulting from my test above and the observed inconsistencies:
1) Wouldn't it be better if shortstrings are treated the same way as
ansistrings with CP_ACP? This would make a difference only during
assignments to strings with different codepages. Since strings with
different codepages didn't exist in the past (and in the current
situation they are simply broken), this change shouldn't break
compatibility hopefully.
2) Shouldn't WriteLn with a untyped string constant parameter result in
calling some Unicode based version of WriteLn rather than the
shortstring overloaded version (since the constant is stored in UTF-16
internally)?
3) Shouldn't we try to make the output of Write with and without unit
Crt compatible to each other? If we do so, what should be the encoding
used for output redirected to a file - should it use
DefaultSystemCodePage, or scpConsoleCP, or what (remember that this
question doesn't exist with unit Crt, because unit Crt isn't compatible
with redirection).
Tomas
More information about the fpc-pascal
mailing list