[fpc-pascal] Printing unicode characters
Nikolay Nikolov
nickysn at gmail.com
Sun Dec 1 08:23:08 CET 2024
On 12/1/24 9:02 AM, Adriaan van Os via fpc-pascal wrote:
> Hairy Pixels via fpc-pascal wrote:
>> ChatGPT is saying I can print unicode scalars like that but i don’t
>> see it works and no compiler warnings even. Did it make this up or
>> did I do something wrong?
>>
>> Writeln('Unicode scalar 1F496: ', #$1F496); // 💖
>> Writeln('Unicode scalar 1F496: ', WideChar($1F496)); // 💖
>
> What people call "Unicode", even compiler manuals, is not "Unicode". I
> repeat it again and again, it is not "Unicode" but so-called
> "Unicode". Microsoft, and those who want to be compatible with it, use
> UTF-16 <https://en.wikipedia.org/wiki/UTF-16> treated as if it were
> UCS-2 <https://en.wikipedia.org/wiki/Universal_Coded_Character_Set>
>
> They call that "Unicode", which is plain nonsense. In the real world,
> one can not stuff 21-bits into 16-bits.
>
> For heaven's sake, let's stop talking about so-called "Unicode" and
> instead use UTF-8 <https://en.wikipedia.org/wiki/UTf-8> or UTF-32
> <https://en.wikipedia.org/wiki/UCS-4>.
Here's how Free Pascal types map to Unicode terminology:
WideChar = UTF-16 code unit
UnicodeString = UTF-16 encoded string
WideString = UTF-16 encoded string. On Windows it's not reference
counted - used for COM compatibility. On other platforms, it's the same
as UnicodeString.
UTF8String = UTF-8 encoded string. Defined as UTF8String=type
AnsiString(CP_UTF8).
UTF16String = alias for UnicodeString
Hope this clears things up.
Another thing:
For conversions between different encodings to work (e.g. between UTF-8
and UTF-16), you need to install a widestring manager. Some platforms
(like Windows) always include one by default, but other platforms (e.g.
Linux) don't, in order to reduce bloat, for programs that don't need it.
For these, you may need to include unit cwstring or something like that.
Nikolay
More information about the fpc-pascal
mailing list