Hope it's useful for you. If you improve the code pls let me know.

This is perfect, thanks! Much more complicated than I thought.

I'm curious now, if you were going the other direction and parsing a string of different unicode characters with different code point sequence lengths how would you know which length it was? For example I started off know which unicode scalar to use by looking at a table but if I had to find the character is stream of text?

I think UTF8 can have 1-4 byte characters so you could encounter 1 byte character followed by 4 byte characters interleaved and there's no header or terminator for each character. How is this solved?

