[fpc-pascal] Printing unicode characters

Nikolay Nikolov nickysn at gmail.com
Sun Dec 1 08:23:08 CET 2024


On 12/1/24 9:02 AM, Adriaan van Os via fpc-pascal wrote:
> Hairy Pixels via fpc-pascal wrote:
>> ChatGPT is saying I can  print unicode scalars like that but i don’t 
>> see it works and no compiler warnings even. Did it make this up or 
>> did I do something wrong?
>>
>>   Writeln('Unicode scalar 1F496: ', #$1F496); // 💖
>>   Writeln('Unicode scalar 1F496: ', WideChar($1F496));  // 💖
>
> What people call "Unicode", even compiler manuals, is not "Unicode". I 
> repeat it again and again, it is not "Unicode" but so-called 
> "Unicode". Microsoft, and those who want to be compatible with it, use 
> UTF-16 <https://en.wikipedia.org/wiki/UTF-16> treated as if it were 
> UCS-2 <https://en.wikipedia.org/wiki/Universal_Coded_Character_Set>
>
> They call that "Unicode", which is plain nonsense. In the real world, 
> one can not stuff 21-bits into 16-bits.
>
> For heaven's sake, let's stop talking about so-called "Unicode" and 
> instead use UTF-8 <https://en.wikipedia.org/wiki/UTf-8> or UTF-32 
> <https://en.wikipedia.org/wiki/UCS-4>.

Here's how Free Pascal types map to Unicode terminology:

WideChar = UTF-16 code unit

UnicodeString = UTF-16 encoded string

WideString = UTF-16 encoded string. On Windows it's not reference 
counted - used for COM compatibility. On other platforms, it's the same 
as UnicodeString.

UTF8String = UTF-8 encoded string. Defined as UTF8String=type 
AnsiString(CP_UTF8).

UTF16String = alias for UnicodeString

Hope this clears things up.


Another thing:

For conversions between different encodings to work (e.g. between UTF-8 
and UTF-16), you need to install a widestring manager. Some platforms 
(like Windows) always include one by default, but other platforms (e.g. 
Linux) don't, in order to reduce bloat, for programs that don't need it. 
For these, you may need to include unit cwstring or something like that.


Nikolay


More information about the fpc-pascal mailing list