[fpc-pascal] Parse unicode scalar

Hairy Pixels genericptr at gmail.com
Tue Jul 4 06:17:33 CEST 2023



> On Jul 4, 2023, at 9:58 AM, Nikolay Nikolov via fpc-pascal <fpc-pascal at lists.freepascal.org> wrote:
> 
> You need to understand all these terms and know exactly what you need to do. E.g. are you dealing with keyboard input, are you dealing with the low level parts of text display, are you searching for something in the text, are you just passing strings around and letting the GUI deal with it? These are all different use cases, and they require careful understanding what Unicode thing you need to iterate over.

Thanks for trying to help but this is more complicated than I thought and I don't have the patience for a deep dive right now :)

Unicode is complicated under the hood but we should have some libraries to help right? I mean the user thinks of these things as "characters" be it "A" or the unicode symbol 👍 so we should be able to operate on that basis as well. Something like an iterator that return the character (wide char) and  byte offset or writing would be a nice place to start.

I have a parser/tokenizer I want to update so I'm trying to find tokens by advancing one character at a time. That's why I have a requirement to know which character is next in the file and probably the byte offset also so it can be referenced later.


Regards,
Ryan Joseph



More information about the fpc-pascal mailing list