[fpc-pascal] Unicode chars losing information

Tomas Hajny XHajT03 at hajny.biz
Mon Mar 8 12:20:14 CET 2021


On 2021-03-08 11:59, Adriaan van Os via fpc-pascal wrote:


Hi,

> adriaan% cat uniquizz-utf8.pas
> 
> {$codepage utf8}
> 
> program uniquizz;
> var
>   chars: UnicodeString;
> begin
>   chars := '⌘ key';
>   writeln(chars);
>   writeln(chars[1]);
>   writeln( 'size ', sizeOf( chars));
>   writeln( 'length ', length( chars));
> end.
> 
> adriaan% fpc uniquizz-utf8.pas -FcUTF-8
> Free Pascal Compiler version 3.0.4 [2018/09/30] for x86_64
> Copyright (c) 1993-2017 by Florian Klaempfl and others
> Target OS: Darwin for x86_64
> Compiling uniquizz-utf8.pas
> Assembling (pipe) uniquizz-utf8.s
> Linking uniquizz-utf8
> 14 lines compiled, 0.1 sec
> 
> [l24:~/gpc/testfpc] adriaan% ./uniquizz-utf8
> ? key
> ?
> size 8
> length 5
> 
> ----
> 
> This leaves me with a question mark too.

UnicodeString is a pointer from technical point of view, SizeOf 
(UnicodeString) thus always returns 8 on 64-bit platforms regardless of 
the string content. Michael already answered regarding the question mark 
output - you need a widestring manager to translate the character from 
the internal storage (UTF-16 - see uniquizz-utf8.s if compiled with -a) 
to your terminal charset.

Tomas


More information about the fpc-pascal mailing list