[fpc-pascal] String literals and code page of .pas source file
Tomas Hajny
XHajT03 at hajny.biz
Sat Sep 12 23:03:11 CEST 2020
On 2020-09-12 18:51, Jonas Maebe via fpc-pascal wrote:
> On 12/09/2020 18:44, Sven Barth via fpc-pascal wrote:
>> Jonas Maebe via fpc-pascal <fpc-pascal at lists.freepascal.org
>> <mailto:fpc-pascal at lists.freepascal.org>> schrieb am Sa., 12. Sep.
>> 2020,
>> 17:47:
>>
>> > All the doubts, questions, and discussions prove that current
>> system is
>> > counter-intuitive and confusing.
>>
>> The issue in this thread is caused by a bug in the LCL: it blindly
>> assumes that the dynamic code page of the caption string is always
>> utf-8. That is simply wrong (unless you put the burden on the user
>> to
>> always assign an utf-8-encoded string to it, but _that_ is
>> counter-intuitive and confusing).
>>
>>
>> But shouldn't the compiler insert a conversion if the string is
>> declared
>> as CP_1250 and the destination is CP_ACP?
>
> There are two things:
> 1) regardless of how the static code page of a string is declared, it
> is
> never guaranteed that its dynamic code page will match it. The simplest
> example is when you assign a RawByteString to it. There is, however,
> also a second case (and this one indeed is counter-intuitive, but
> needed
> for backward compatibility):
> 2) the second bullet under
> https://wiki.freepascal.org/FPC_Unicode_support#Dynamic_code_page
>
> That's what gets triggered here: the source file CP is CP_1250 and the
> string is also ansistring(1250). That case would be solved by declaring
> Label as UTF8String though.
Yep.
While performing some tests, I came across other things which are not
very nice either (those are specific to the Win32/Win64 target due to
the difference between process codepage and console codepage). Let's
take the following test program:
{$codepage cp1250}
{$IFDEF USECRT}
uses
Crt;
{$ENDIF USECRT}
const
S = 'žluťoučký kůň';
var
T: string;
begin
T := S;
{$IFDEF USECRT}
Write ('Using Crt');
{$ELSE USECRT}
Write ('Not using Crt');
{$ENDIF USECRT}
WriteLn (S);
WriteLn (T);
WriteLn (DefaultSystemCodepage);
WriteLn (TextRec (Output).Codepage);
end.
Let's compile it _without_ -dUSECRT and _with_ -Mfpc first. The original
poster uses the same default codepage as me. If I start cmd.exe and run
"chcp" without parameters, it shows codepage 852 as the console
codepage. Now run the test program. It shows that the codepage for the
default file handle Output matches the console codepage (as it should),
but the string output is incorrect for both WriteLn(S) and WriteLn(T)
lines. If you perform "chcp 1250" and run the program again, the
codepages match and the string output is correct.
If you compile the same program with -dUSECRT, the output is correct for
both WriteLn calls regardless from the console codepage setting (i.e.
both for "chcp 852" and for "chcp 1250" - and also for "chcp 65001").
If you compile the same program with -dUSECRT and -Mdelphi together and
run the program in a console window set to codepage 852 (i.e. the
default setting here), the first WriteLn call is wrong, whereas the
second gives a correct result (due to the fact that T becomes an
ansistring in mode Delphi and dynamic translation is thus performed as
opposed to the case when a shortstring or an untyped constant are
passed).
Tomas
More information about the fpc-pascal
mailing list