[fpc-devel] Re: enumerators
Michael Schnell
mschnell at lumino.de
Thu Nov 18 13:56:02 CET 2010
On 11/18/2010 12:33 AM, Hans-Peter Diettrich wrote:
> Separator characters can be assumed as ASCII, so that they can be
> found by a dumb byte/char scan; only few encodings have to be
> recognized and handled, based on the char size: MBCS (UTF-8...),
> WideChars (UTF-16/UCS2) and UTF-32.
>
In fact I suppose that for UTF-8 ("pure UTF-8" without surrogates) pos()
works for all strings and an UTF-8 "character" is a string. It's just
not allowed to use the result of pos() other than in the position
argument of copy() or delete() and to calculate the length argument for
copy() or delete() as a difference between pos() results or
Length(String)-values. this makes it hard to extract a single Unicode
character from an UTF-8 string, but of course it's easy to create a
library function that gets a pos() result and - decoding the UTF-8 code
- creates an UTF-8 string containing the next Unicode character. (UTF-8
coded surrogate pairs may need additional attention)
-Michael
More information about the fpc-devel
mailing list