[fpc-pascal] Parse unicode scalar

Nikolay Nikolov nickysn at gmail.com
Sun Jul 2 19:20:02 CEST 2023


On 7/2/23 16:30, Hairy Pixels via fpc-pascal wrote:
> I'm interested in parsing unicode scalars (I think they're called) to byte sized values but I'm not sure where to start. First thing I did was choose the unicode scalar U+1F496 (💖).

There's no such thing as "unicode scalar" in Unicode terminology:

https://unicode.org/glossary/

So, what do you mean? A Unicode code point? An Extended Grapheme 
Cluster? Or something else? There are also several ways to encode 
Unicode into a byte sequence - UTF-8, UTF-16LE, UTF-16BE, UTF-32, etc.

>
> Next I cheated and ask ChatGPT. :) Amazingly from my question it was able to tell me the scaler is comprised of these 4 bytes:
>
>   240 159 146 150
>
> I was able to correctly concatenate these characters and writeln printed the correct character.
>
> var
> 	s: String;
> begin
> s := char(240)+char(159)+char(146)+char(150);
> writeln(s);
> end.
>
> The question is, how was 1F496 decomposed into 4 bytes?

I guess you should ask ChatGPT, who gave you the answer ;-)

Nikolay



More information about the fpc-pascal mailing list