[fpc-pascal] Unicode chars losing information
Tomas Hajny
XHajT03 at hajny.biz
Mon Mar 8 12:20:14 CET 2021
On 2021-03-08 11:59, Adriaan van Os via fpc-pascal wrote:
Hi,
> adriaan% cat uniquizz-utf8.pas
>
> {$codepage utf8}
>
> program uniquizz;
> var
> chars: UnicodeString;
> begin
> chars := '⌘ key';
> writeln(chars);
> writeln(chars[1]);
> writeln( 'size ', sizeOf( chars));
> writeln( 'length ', length( chars));
> end.
>
> adriaan% fpc uniquizz-utf8.pas -FcUTF-8
> Free Pascal Compiler version 3.0.4 [2018/09/30] for x86_64
> Copyright (c) 1993-2017 by Florian Klaempfl and others
> Target OS: Darwin for x86_64
> Compiling uniquizz-utf8.pas
> Assembling (pipe) uniquizz-utf8.s
> Linking uniquizz-utf8
> 14 lines compiled, 0.1 sec
>
> [l24:~/gpc/testfpc] adriaan% ./uniquizz-utf8
> ? key
> ?
> size 8
> length 5
>
> ----
>
> This leaves me with a question mark too.
UnicodeString is a pointer from technical point of view, SizeOf
(UnicodeString) thus always returns 8 on 64-bit platforms regardless of
the string content. Michael already answered regarding the question mark
output - you need a widestring manager to translate the character from
the internal storage (UTF-16 - see uniquizz-utf8.s if compiled with -a)
to your terminal charset.
Tomas
More information about the fpc-pascal
mailing list