[fpc-pascal] Printing unicode characters

Hairy Pixels genericptr at gmail.com
Sun Dec 1 14:37:01 CET 2024


 On Dec 1, 2024 at 2:23:08 PM, Nikolay Nikolov via fpc-pascal <
fpc-pascal at lists.freepascal.org> wrote:

> Here's how Free Pascal types map to Unicode terminology:
>
> WideChar = UTF-16 code unit
>
> UnicodeString = UTF-16 encoded string
>
> WideString = UTF-16 encoded string. On Windows it's not reference
> counted - used for COM compatibility. On other platforms, it's the same
> as UnicodeString.
>
> UTF8String = UTF-8 encoded string. Defined as UTF8String=type
> AnsiString(CP_UTF8).
>
> UTF16String = alias for UnicodeString
>
> Hope this clears things up.
>
>
> Another thing:
>
> For conversions between different encodings to work (e.g. between UTF-8
> and UTF-16), you need to install a widestring manager. Some platforms
> (like Windows) always include one by default, but other platforms (e.g.
> Linux) don't, in order to reduce bloat, for programs that don't need it.
> For these, you may need to include unit cwstring or something like that.


Including that unit is sneaky, seems you need it anytime dealing with
unicode. Not sure how it even knows to change the meaning of those
character constants.

Using the term “char” was maybe a mistake. This misleads people into
thinking it’s a “character” as they perceive it but really it’s just a code
point. Why isn’t there a “UnicodeChar” type which is 4 bytes and hold a
full UTF-8 character? That’s probably what most people are expecting when
they think “unicode” and “character”. Their are still compound characters
which appear as one but actually multiple overlayed but still getting the
component parts is useful.

Choosing UTF-16 for UnicodeString was probably a mistake too. It’s my
understanding all websites are UTF-8 which means this encoding will
dominate everything. I think UTF-8  is by far the most used right?

As a user I would expect to take a string constant and assigning it to a
UnicodeString would let me iterate over UnicodeChar. That’s logical right?
Maybe this is just left undone as of now. I don’t know.

var
  u: UnicodeChar;
  s: UnicodeString;
begin
  s := 'Hello, 🌎!';
  for u in s do
    writeln(u);


Regards,
    Ryan Joseph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20241201/2f362786/attachment.htm>


More information about the fpc-pascal mailing list