[fpc-pascal] Default source encoding

Jonas Maebe jonas.maebe at elis.ugent.be
Thu Mar 31 14:38:35 CEST 2016


On 31/03/16 14:12, Mattias Gaertner wrote:
> On Thu, 31 Mar 2016 13:52:54 +0200
> Jonas Maebe <jonas.maebe at elis.ugent.be> wrote:
>
>> On 31/03/16 13:46, Mattias Gaertner wrote:
>>
>>> According to
>>> http://wiki.freepascal.org/index.php?title=FPC_Unicode_support#String_constants
>>>
>>> "the constant strings are assumed to have code page 28591 (ISO 8859-1
>>> Latin 1; Western European)."
>>>
>>> Is this true?
>>
>> Yes.
>
> What happens on a Russian system cp1251 with a cp1251 AnsiString
> literal?
>
> writeln('Привет');

There are two separate things:
a) the code page that the compiler uses *if* it has to convert a string 
at compile time to a different code page (e.g. because you assign the 
string constant to an ansistring(1251), or to a unicodestring)
b) whether or not it will in fact convert a string at compile time to a 
different code page

a) is what I was talking about above.

For b), the conditions are described in the the section linked above.

So, in this case: if the source file code page is CP_ACP (i.e., no 
explicit code page specified), then writeln('constant') will call either 
writeln(shortstring) or writeln(rawbytestring) (I'm not sure which one 
by heart, it may depend on the state of {$h+}), and hence the described 
rules for assigning a constant string to a shortstring/rawbytestring apply.

Therefore, no *compile time* conversion of the string type will happen 
in this case, since the code page of the string constant and that of the 
called helper match, or because the called helper uses rawbytestring.

This means that the string constant will be stored unmodified in the 
binary with as code page CP_ACP (a situation that can never happen in 
Delphi-with-support-for-codepage-aware strings, but which is done by 
default in FPC because it matches the behaviour of previous FPC and 
Delphi versions), and the string constant will be interpreted at run 
time using whatever the actual value of DefaultSystemCodePage is at that 
time.

So with DefaultSystemCodePage = 1251, a string constant encoded in 
cp1251, and with source file code = CP_ACP, the result of writeln will 
be correct. Running such a program with a different 
DefaultSystemCodePage may result in errors (depending on how much the 
actual code page differs from cp1251 for the printed character).


Jonas



More information about the fpc-pascal mailing list