[fpc-pascal] String literals and code page of .pas source file

Tomas Hajny XHajT03 at hajny.biz
Sat Sep 12 23:03:11 CEST 2020


On 2020-09-12 18:51, Jonas Maebe via fpc-pascal wrote:
> On 12/09/2020 18:44, Sven Barth via fpc-pascal wrote:
>> Jonas Maebe via fpc-pascal <fpc-pascal at lists.freepascal.org
>> <mailto:fpc-pascal at lists.freepascal.org>> schrieb am Sa., 12. Sep. 
>> 2020,
>> 17:47:
>> 
>>     > All the doubts, questions, and discussions prove that current
>>     system is
>>     > counter-intuitive and confusing.
>> 
>>     The issue in this thread is caused by a bug in the LCL: it blindly
>>     assumes that the dynamic code page of the caption string is always
>>     utf-8. That is simply wrong (unless you put the burden on the user 
>> to
>>     always assign an utf-8-encoded string to it, but _that_ is
>>     counter-intuitive and confusing).
>> 
>> 
>> But shouldn't the compiler insert a conversion if the string is 
>> declared
>> as CP_1250 and the destination is CP_ACP? 
> 
> There are two things:
> 1) regardless of how the static code page of a string is declared, it 
> is
> never guaranteed that its dynamic code page will match it. The simplest
> example is when you assign a RawByteString to it. There is, however,
> also a second case (and this one indeed is counter-intuitive, but 
> needed
> for backward compatibility):
> 2) the second bullet under
> https://wiki.freepascal.org/FPC_Unicode_support#Dynamic_code_page
> 
> That's what gets triggered here: the source file CP is CP_1250 and the
> string is also ansistring(1250). That case would be solved by declaring
> Label as UTF8String though.

Yep.

While performing some tests, I came across other things which are not 
very nice either (those are specific to the Win32/Win64 target due to 
the difference between process codepage and console codepage). Let's 
take the following test program:

{$codepage cp1250}
{$IFDEF USECRT}
uses
  Crt;
{$ENDIF USECRT}
const
  S = 'žluťoučký kůň';
var
  T: string;
begin
  T := S;
{$IFDEF USECRT}
  Write ('Using Crt');
{$ELSE USECRT}
  Write ('Not using Crt');
{$ENDIF USECRT}
  WriteLn (S);
  WriteLn (T);
  WriteLn (DefaultSystemCodepage);
  WriteLn (TextRec (Output).Codepage);
end.

Let's compile it _without_ -dUSECRT and _with_ -Mfpc first. The original 
poster uses the same default codepage as me. If I start cmd.exe and run 
"chcp" without parameters, it shows codepage 852 as the console 
codepage. Now run the test program. It shows that the codepage for the 
default file handle Output matches the console codepage (as it should), 
but the string output is incorrect for both WriteLn(S) and WriteLn(T) 
lines. If you perform "chcp 1250" and run the program again, the 
codepages match and the string output is correct.

If you compile the same program with -dUSECRT, the output is correct for 
both WriteLn calls regardless from the console codepage setting (i.e. 
both for "chcp 852" and for "chcp 1250" - and also for "chcp 65001").

If you compile the same program with -dUSECRT and -Mdelphi together and 
run the program in a console window set to codepage 852 (i.e. the 
default setting here), the first WriteLn call is wrong, whereas the 
second gives a correct result (due to the fact that T becomes an 
ansistring in mode Delphi and dynamic translation is thus performed as 
opposed to the case when a shortstring or an untyped constant are 
passed).

Tomas


More information about the fpc-pascal mailing list