[fpc-devel] Unicode support (again)

Wed Nov 12 09:25:29 CET 2008

> Lazarus assumes that an ansistring contains always utf-8. This is not
> generally true.
While this might be true, I think it's a consequence of a shortcoming of 
FPC, which simply identifies the types ANSIString and UTF8String. IMHO 
(in a future version) it should take care of the encoding of string types.

I suggest that there should be the native string types ANSIString, 
UFT8String, UCS2String and UTF16String together with the appropriate 
character types ANSIChar, UFT8Char, UCS2Char and UTF16Char (UTF8Char and 
UTF16Char in fact being the appropriate strings. The compiler and RTL 
should take care of any conversion between those (and do the appropriate 
constant assignment when needed).

Now there should be compiler options to have the use select which type 
he wants to use for the generic "String" type (all four applicable) and 
which he want to use for the generic WideString Type (UCS2String and 
UTF16String applicable). The generic char and WideChar type is assigned 
appropriately.

Moreover this version should for all native string types provide as well 
Unicode-character("code point")-counted as submode("code unit")-counted 
functions and procedures for what we know as s[i], length(s), pos(), 
copy(), delete(), ...

There should be compiler options (for all native string types) to have 
the user select which of the two he wants use for the generic s[i], 
length(s), pos(), copy(), delete(), ... notation.

With this provided, Lazarus would be able to provide whichever API for 
LCL they want in a decent and highly compatible way (They _should_ allow 
the user to select if he wants to link in an ANSIString, UTF8String or 
UCS2String version).

Moreover this would allow for tuning a project for space of for speed 
according to the platform we want to compile it for (e.g 32 Bit PC or 
ARM based Cellphone).

-Michael