[fpc-pascal] Unicode chars losing information

Michael Van Canneyt michael at freepascal.org
Mon Mar 8 12:08:40 CET 2021



On Mon, 8 Mar 2021, Adriaan van Os via fpc-pascal wrote:

> adriaan% cat uniquizz-utf8.pas
>
> {$codepage utf8}
>
> program uniquizz;
> var
>   chars: UnicodeString;
> begin
>   chars := '⌘ key';
>   writeln(chars);
>   writeln(chars[1]);
>   writeln( 'size ', sizeOf( chars));
>   writeln( 'length ', length( chars));
> end.
>
> adriaan% fpc uniquizz-utf8.pas -FcUTF-8
> Free Pascal Compiler version 3.0.4 [2018/09/30] for x86_64
> Copyright (c) 1993-2017 by Florian Klaempfl and others
> Target OS: Darwin for x86_64
> Compiling uniquizz-utf8.pas
> Assembling (pipe) uniquizz-utf8.s
> Linking uniquizz-utf8
> 14 lines compiled, 0.1 sec
>
> [l24:~/gpc/testfpc] adriaan% ./uniquizz-utf8
> ? key
> ?
> size 8
> length 5
>
> ----
>
> This leaves me with a question mark too.

The output for me is the same, regardless of the -FcUTF-8 flag being present
or not: question marks.

But if I add

uses cwstring;

all will be well.

Rationale:
Without that, the RTL cannot convert whatever the compiler wrote in
the binary to UTF8 to display it on the console.

The compiler people will need to explain what exactly the compiler writes
with or without the flag.

Michael.


More information about the fpc-pascal mailing list