[fpc-pascal] String literals and code page of .pas source file

Tomas Hajny XHajT03 at hajny.biz
Mon Sep 14 14:11:01 CEST 2020


On 2020-09-14 13:39, Michael Van Canneyt wrote:
> On Mon, 14 Sep 2020, Tomas Hajny via fpc-pascal wrote:
> 
>> On 2020-09-12 23:03, Tomas Hajny wrote:
>>> On 2020-09-12 18:51, Jonas Maebe via fpc-pascal wrote:
>>>> On 12/09/2020 18:44, Sven Barth via fpc-pascal wrote:
>>>>> Jonas Maebe via fpc-pascal <fpc-pascal at lists.freepascal.org
>>>>> <mailto:fpc-pascal at lists.freepascal.org>> schrieb am Sa., 12. Sep. 
>>>>> 2020,
>>>>> 17:47:
  .
  .
>> 1) Wouldn't it be better if shortstrings are treated the same way as 
>> ansistrings with CP_ACP? This would make a difference only during 
>> assignments to strings with different codepages. Since strings with 
>> different codepages didn't exist in the past (and in the current 
>> situation they are simply broken), this change shouldn't break 
>> compatibility hopefully.
> 
> No idea what to advise here. I would think shortstring is ASCII or OEM
> codepage, not even ANSI :/

As far as I'm concerned, there's no difference between OEM or ANSI (or 
ISO 8859-x for that matter) _unless_ somebody targets Win32/Win64 and 
never anything else. From this point of view, there's no reason why 
shortstrings should be always OEM. Historically, they were used simply 
for the default characters set on the particular operaing environment.


>> 2) Shouldn't WriteLn with a untyped string constant parameter result 
>> in calling some Unicode based version of WriteLn rather than the 
>> shortstring overloaded version (since the constant
>> is stored in UTF-16 internally)?
> 
> What is the codepage of a constant string ? Should this not be used ?

That's what I wrote - internally, the (untyped) constant strings are 
stored in UTF-16.


>> 3) Shouldn't we try to make the output of Write with and without unit 
>> Crt compatible to each other? If we do so, what should be the encoding 
>> used for output redirected to a file - should it use 
>> DefaultSystemCodePage, or scpConsoleCP, or what (remember that this 
>> question doesn't exist with unit Crt, because unit Crt isn't 
>> compatible with redirection).
> 
> I think this last one are in fact 3 questions:
> - What to do if output is redirected externally ? (IMHO nothing)

There's no "nothing". Every text file record has an attribute stating 
the codepage used for that text file. The question is which codepage 
should be assigned there under which cases.


> - What to do if output is redirected internally ? (IMHO, the codepage
> should be kept)

Kept from what?


> - Whether and how to extend Crt so it works with unicode.
>   (Since Crt is legacy, I would not touch it; You'd need to rewrite it
> as unicode.)

That is a different question and I don't want to raise that one. My 
question is simply whether WriteLn (shortstring) should behave 
differently with and without Crt.

Tomas


More information about the fpc-pascal mailing list