[fpc-pascal] Parse unicode scalar

Hairy Pixels genericptr at gmail.com
Mon Jul 3 03:29:11 CEST 2023



> On Jul 2, 2023, at 11:16 PM, Jer Haan <jdehaan2014 at gmail.com> wrote:
> 
> This table is copied from Wikipedia.<uencoding.pas>Hope it’s useful for you. If you improve the code pls let me know.
> 

This is perfect, thanks! Much more complicated than I thought.

I'm curious now, if you were going the other direction and parsing a string of different unicode characters with different code point sequence lengths how would you know which length it was? For example I started off know which unicode scalar to use by looking at a table but if I had to find the character is stream of text?

I think UTF8 can have 1-4 byte characters so you could encounter 1 byte character followed by 4 byte characters interleaved and there's no header or terminator for each character. How is this solved?

Regards,
Ryan Joseph



More information about the fpc-pascal mailing list