[fpc-pascal] String literals and code page of .pas source file
Tomas Hajny
XHajT03 at hajny.biz
Tue Sep 15 01:56:08 CEST 2020
On 2020-09-14 16:09, Michael Van Canneyt wrote:
> On Mon, 14 Sep 2020, Tomas Hajny via fpc-pascal wrote:
>
>>>> application (let's say notepad.exe) will result in garbage. I don't
>>>> say that it is necessarily bad, but it should be documented at least
>>>> if we want to keep it that way.
>>>
>>> I would definitely keep it that way.
>>>
>>> As I see it: Redirection or not should not matter, the system should
>>> assume console output.
>>> Things like 'tee' make this concept dubious in any case:
>>>
>>> If you pipe output to a program, you don't expect the codepage to
>>> change
>>> because of the redirection.
>>
>> No problem, but I'd suggest documenting it at least.
>
> Document what exactly ? That redirecting does not change the codepage ?
1) Document that the following test program results in two different
lines under Win32/Win64 (unless you change the console codepage to be
equal to the process codepage before running the program):
{$CODEPAGE CP1250}
const
S = 'Úžasné';
var
A: ansistring;
T: text;
begin
Assign (T, '');
Rewrite (T);
A := S;
WriteLn (A);
WriteLn (T, A);
Close (T);
end.
2) Document that the following test program compiled, run and having the
output redirected to a file named output1.txt results in two files with
different content (again unless you change the console to the process
codepage before running the program):
{$CODEPAGE CP1250}
const
S = 'Úžasné';
var
A: ansistring;
T: text;
begin
Assign (T, 'output2.txt');
Rewrite (T);
A := S;
WriteLn (A);
WriteLn (T, A);
Close (T);
end.
.
.
>> Not really accidental:
>>
>> r3606 | florian | 2006-05-20 23:42:58 +0200 (Sat, 20 May 2006) | 2
>> lines
>> * fix from Maxim Ganetsky to fix CRT output with non latin code pages,
>> should fix #6785
>>
>> (there were additional changes performed later, but the primary change
>> was this one)
>
> Does this handle UTF8 ?
In what sense? It works correctly (under Win32/Win64) if I change the
console codepage to 65001 (both for shortstrings and for ansistrings),
and it works correctly if I assign a constant to an Utf8string and write
it to the console regardless from the console codepage.
> Judging by the sources, I would think not:
>
> Interface
>
> {$mode fpc} // Shortstring is assumed
> {$i crth.inc}
>
> Const
> { Controlling consts }
> Flushing = false; {if true then don't buffer
> output}
> ConsoleMaxX = 1024;
> ConsoleMaxY = 1024;
> ScreenHeight : longint = 25;
> ScreenWidth : longint = 80;
>
> Type
> TCharAttr=packed record
> ch : char;
> attr : byte;
> end;
> TConsoleBuf=Array[0..ConsoleMaxX*ConsoleMaxY-1] of TCharAttr;
> PConsoleBuf=^TConsoleBuf;
>
> var
> ConsoleBuf : PConsoleBuf;
>
> Since every screen position handles only a single char (byte) there is
> no
> way this can handle UTF8. Maybe other "real" single-byte codepages,
> yes.
I assume that you're looking at the implementation for Linux, whereas I
talk about the implementation for Win32/Win64. I don't know if it makes
any difference with regard to UTF-8 string handling on Linux (I'm not
aware of any particular issues, but I might be wrong).
Tomas
More information about the fpc-pascal
mailing list