[fpc-devel] Unicode support (again)

Mon Nov 10 16:48:38 CET 2008

I found that the current FPC does have Unicode support, but there are 
some problems.

 - WideStrings work fine with Unicode UCS-2 but they (of course) have 
similar issues as UTF8-Strings when surrogate codes are used (which is 
rarely necessary in Europe and America).

 - FPC does not have a dedicated type "UTF8String", but the type defined 
as "UTF8String" is just the same as ANSIString and thus the compiler 
can't decide which is meant by the programmer and can't create the 
appropriate code when it's necessary to distinguish between them (e.g 
when it automatically should converting between locale-coded ANSIString, 
UTF8String and WideString)

 - by design (for speed sake), UTF8String (and WideString when surrogate 
codes are used) count in subcodes and not in Unicode-Characters, so the 
behavior is "unexpected" when doing things like s[i], pos(s), copy(), 
delete(), ... There are not _slow_ functions that do the "expected" 
versions of s[i], pos(s), copy(), delete(), ... (I've yet to find out 
how I can print just the first character of an UTF8String :)

 - there is no decent "character" type for UTF8 or UTF16 coded 
Characters (WideChar (UCS2 code) works if no surrogate codes are used.)

 - there are different option on how the compiler expects the coding of 
the source file. Seemingly if it detects it to be UTF8 coded and a 
certain (otherwise correct) option is set, even "s := 'hallo äöü'; " 
does not work correctly as expected if s is a WideString. (Lazarus with 
default settings suffers from this problem).

-Michael