[fpc-pascal] String literals and code page of .pas source file
Michael Van Canneyt
michael at freepascal.org
Mon Sep 14 13:39:36 CEST 2020
On Mon, 14 Sep 2020, Tomas Hajny via fpc-pascal wrote:
> On 2020-09-12 23:03, Tomas Hajny wrote:
>> On 2020-09-12 18:51, Jonas Maebe via fpc-pascal wrote:
>>> On 12/09/2020 18:44, Sven Barth via fpc-pascal wrote:
>>>> Jonas Maebe via fpc-pascal <fpc-pascal at lists.freepascal.org
>>>> <mailto:fpc-pascal at lists.freepascal.org>> schrieb am Sa., 12. Sep.
>>>> 2020,
>>>> 17:47:
> .
> .
>> While performing some tests, I came across other things which are not
>> very nice either (those are specific to the Win32/Win64 target due to
>> the difference between process codepage and console codepage). Let's
>> take the following test program:
>>
>> {$codepage cp1250}
>> {$IFDEF USECRT}
>> uses
>> Crt;
>> {$ENDIF USECRT}
>> const
>> S = 'žluťoučký kůň';
>> var
>> T: string;
>> begin
>> T := S;
>> {$IFDEF USECRT}
>> Write ('Using Crt');
>> {$ELSE USECRT}
>> Write ('Not using Crt');
>> {$ENDIF USECRT}
>> WriteLn (S);
>> WriteLn (T);
>> WriteLn (DefaultSystemCodepage);
>> WriteLn (TextRec (Output).Codepage);
>> end.
>>
>> Let's compile it _without_ -dUSECRT and _with_ -Mfpc first. The
>> original poster uses the same default codepage as me. If I start
>> cmd.exe and run "chcp" without parameters, it shows codepage 852 as
>> the console codepage. Now run the test program. It shows that the
>> codepage for the default file handle Output matches the console
>> codepage (as it should), but the string output is incorrect for both
>> WriteLn(S) and WriteLn(T) lines. If you perform "chcp 1250" and run
>> the program again, the codepages match and the string output is
>> correct.
>>
>> If you compile the same program with -dUSECRT, the output is correct
>> for both WriteLn calls regardless from the console codepage setting
>> (i.e. both for "chcp 852" and for "chcp 1250" - and also for "chcp
>> 65001").
>>
>> If you compile the same program with -dUSECRT and -Mdelphi together
>> and run the program in a console window set to codepage 852 (i.e. the
>> default setting here), the first WriteLn call is wrong, whereas the
>> second gives a correct result (due to the fact that T becomes an
>> ansistring in mode Delphi and dynamic translation is thus performed as
>> opposed to the case when a shortstring or an untyped constant are
>> passed).
>
> Questions resulting from my test above and the observed inconsistencies:
>
> 1) Wouldn't it be better if shortstrings are treated the same way as
> ansistrings with CP_ACP? This would make a difference only during
> assignments to strings with different codepages. Since strings with
> different codepages didn't exist in the past (and in the current
> situation they are simply broken), this change shouldn't break
> compatibility hopefully.
No idea what to advise here.
I would think shortstring is ASCII or OEM codepage, not even ANSI :/
> 2) Shouldn't WriteLn with a untyped string constant parameter result in
> calling some Unicode based version of WriteLn rather than the
> shortstring overloaded version (since the constant is stored in UTF-16
> internally)?
What is the codepage of a constant string ? Should this not be used ?
>
> 3) Shouldn't we try to make the output of Write with and without unit
> Crt compatible to each other? If we do so, what should be the encoding
> used for output redirected to a file - should it use
> DefaultSystemCodePage, or scpConsoleCP, or what (remember that this
> question doesn't exist with unit Crt, because unit Crt isn't compatible
> with redirection).
I think this last one are in fact 3 questions:
- What to do if output is redirected externally ? (IMHO nothing)
- What to do if output is redirected internally ? (IMHO, the codepage should be kept)
- Whether and how to extend Crt so it works with unicode.
(Since Crt is legacy, I would not touch it; You'd need to rewrite it as unicode.)
Michael.
More information about the fpc-pascal
mailing list