[fpc-pascal] String literals and code page of .pas source file

Tomas Hajny XHajT03 at hajny.biz
Tue Sep 15 01:56:08 CEST 2020


On 2020-09-14 16:09, Michael Van Canneyt wrote:
> On Mon, 14 Sep 2020, Tomas Hajny via fpc-pascal wrote:
> 
>>>> application (let's say notepad.exe) will result in garbage.  I don't 
>>>> say that it is necessarily bad, but it should be documented at least 
>>>> if we want to keep it that way.
>>> 
>>> I would definitely keep it that way.
>>> 
>>> As I see it: Redirection or not should not matter, the system should
>>> assume console output.
>>> Things like 'tee' make this concept dubious in any case:
>>> 
>>> If you pipe output to a program, you don't expect the codepage to 
>>> change
>>> because of the redirection.
>> 
>> No problem, but I'd suggest documenting it at least.
> 
> Document what exactly ? That redirecting does not change the codepage ?

1) Document that the following test program results in two different 
lines under Win32/Win64 (unless you change the console codepage to be 
equal to the process codepage before running the program):

{$CODEPAGE CP1250}
const
  S = 'Úžasné';
var
  A: ansistring;
  T: text;
begin
  Assign (T, '');
  Rewrite (T);
  A := S;
  WriteLn (A);
  WriteLn (T, A);
  Close (T);
end.


2) Document that the following test program compiled, run and having the 
output redirected to a file named output1.txt results in two files with 
different content (again unless you change the console to the process 
codepage before running the program):

{$CODEPAGE CP1250}
const
  S = 'Úžasné';
var
  A: ansistring;
  T: text;
begin
  Assign (T, 'output2.txt');
  Rewrite (T);
  A := S;
  WriteLn (A);
  WriteLn (T, A);
  Close (T);
end.


  .
  .
>> Not really accidental:
>> 
>> r3606 | florian | 2006-05-20 23:42:58 +0200 (Sat, 20 May 2006) | 2 
>> lines
>> * fix from Maxim Ganetsky to fix CRT output with non latin code pages, 
>> should fix #6785
>> 
>> (there were additional changes performed later, but the primary change 
>> was this one)
> 
> Does this handle UTF8 ?

In what sense? It works correctly (under Win32/Win64) if I change the 
console codepage to 65001 (both for shortstrings and for ansistrings), 
and it works correctly if I assign a constant to an Utf8string and write 
it to the console regardless from the console codepage.


> Judging by the sources, I would think not:
> 
> Interface
> 
> {$mode fpc} // Shortstring is assumed
> {$i crth.inc}
> 
> Const
>   { Controlling consts }
>   Flushing     = false;               {if true then don't buffer 
> output}
>   ConsoleMaxX  = 1024;
>   ConsoleMaxY  = 1024;
>   ScreenHeight : longint = 25;
>   ScreenWidth  : longint = 80;
> 
> Type
>   TCharAttr=packed record
>     ch   : char;
>     attr : byte;
>   end;
>   TConsoleBuf=Array[0..ConsoleMaxX*ConsoleMaxY-1] of TCharAttr;
>   PConsoleBuf=^TConsoleBuf;
> 
> var
>   ConsoleBuf : PConsoleBuf;
> 
> Since every screen position handles only a single char (byte) there is 
> no
> way this can handle UTF8. Maybe other "real" single-byte codepages, 
> yes.

I assume that you're looking at the implementation for Linux, whereas I 
talk about the implementation for Win32/Win64. I don't know if it makes 
any difference with regard to UTF-8 string handling on Linux (I'm not 
aware of any particular issues, but I might be wrong).

Tomas


More information about the fpc-pascal mailing list