[fpc-devel] Re: enumerators

Thu Nov 18 13:56:02 CET 2010

On 11/18/2010 12:33 AM, Hans-Peter Diettrich wrote:
> Separator characters can be assumed as ASCII, so that they can be 
> found by a dumb byte/char scan; only few encodings have to be 
> recognized and handled, based on the char size: MBCS (UTF-8...), 
> WideChars (UTF-16/UCS2) and UTF-32.
>
In fact I suppose that for UTF-8 ("pure UTF-8" without surrogates) pos() 
works for all strings and an UTF-8 "character" is a string. It's just 
not allowed to use the result of pos() other than in the position 
argument of copy() or delete() and to calculate the length argument for 
copy() or delete() as a difference between pos() results or 
Length(String)-values. this makes it hard to extract a single Unicode 
character from an UTF-8 string, but of course it's easy to create a 
library function that gets a pos() result and - decoding the UTF-8 code 
- creates an UTF-8 string containing the next Unicode character. (UTF-8 
coded surrogate pairs may need additional attention)

-Michael