[fpc-pascal] Parse unicode scalar

Jer Haan jdehaan2014 at gmail.com
Sun Jul 2 18:16:08 CEST 2023


Hi Ryan,

I’ve created attached unit, which takes a code point and returns the utf8 char as a string. 
It’s based on the Wikipedia article on UTF8

UTF-8 encodes code points in one to four bytes, depending on the value of the code point. The x characters are replaced by the bits of the code point:




This table is copied from Wikipedia.

Hope it’s useful for you. If you improve the code pls let me know.

Best regards,
Jeroen



On 2 Jul 2023, at 15:30, Hairy Pixels via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:

I'm interested in parsing unicode scalars (I think they're called) to byte sized values but I'm not sure where to start. First thing I did was choose the unicode scalar U+1F496 (💖).

Next I cheated and ask ChatGPT. :) Amazingly from my question it was able to tell me the scaler is comprised of these 4 bytes:

240 159 146 150

I was able to correctly concatenate these characters and writeln printed the correct character.

var
	s: String;
begin
s := char(240)+char(159)+char(146)+char(150);
writeln(s);
end.

The question is, how was 1F496 decomposed into 4 bytes? 

Regards,
	Ryan Joseph

_______________________________________________
fpc-pascal maillist  -  fpc-pascal at lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20230702/f7f0ec0e/attachment-0003.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: PastedGraphic-1.png
Type: image/png
Size: 69509 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20230702/f7f0ec0e/attachment-0001.png>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20230702/f7f0ec0e/attachment-0004.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: uencoding.pas
Type: application/octet-stream
Size: 2012 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20230702/f7f0ec0e/attachment-0001.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20230702/f7f0ec0e/attachment-0005.htm>


More information about the fpc-pascal mailing list