[fpc-pascal] Parse unicode scalar
Jer Haan
jdehaan2014 at gmail.com
Sun Jul 2 18:16:08 CEST 2023
Hi Ryan,
I’ve created attached unit, which takes a code point and returns the utf8 char as a string.
It’s based on the Wikipedia article on UTF8
UTF-8 encodes code points in one to four bytes, depending on the value of the code point. The x characters are replaced by the bits of the code point:

This table is copied from Wikipedia.

Hope it’s useful for you. If you improve the code pls let me know.
Best regards,
Jeroen
On 2 Jul 2023, at 15:30, Hairy Pixels via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:
I'm interested in parsing unicode scalars (I think they're called) to byte sized values but I'm not sure where to start. First thing I did was choose the unicode scalar U+1F496 (💖).
Next I cheated and ask ChatGPT. :) Amazingly from my question it was able to tell me the scaler is comprised of these 4 bytes:
240 159 146 150
I was able to correctly concatenate these characters and writeln printed the correct character.
var
s: String;
begin
s := char(240)+char(159)+char(146)+char(150);
writeln(s);
end.
The question is, how was 1F496 decomposed into 4 bytes?
Regards,
Ryan Joseph
_______________________________________________
fpc-pascal maillist - fpc-pascal at lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20230702/f7f0ec0e/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 69509 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20230702/f7f0ec0e/attachment-0001.png>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20230702/f7f0ec0e/attachment-0004.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uencoding.pas
Type: application/octet-stream
Size: 2012 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20230702/f7f0ec0e/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20230702/f7f0ec0e/attachment-0005.htm>
More information about the fpc-pascal
mailing list