[fpc-devel] Unicode support (again)
Michael Schnell
mschnell at lumino.de
Wed Nov 12 09:25:29 CET 2008
> Lazarus assumes that an ansistring contains always utf-8. This is not
> generally true.
While this might be true, I think it's a consequence of a shortcoming of
FPC, which simply identifies the types ANSIString and UTF8String. IMHO
(in a future version) it should take care of the encoding of string types.
I suggest that there should be the native string types ANSIString,
UFT8String, UCS2String and UTF16String together with the appropriate
character types ANSIChar, UFT8Char, UCS2Char and UTF16Char (UTF8Char and
UTF16Char in fact being the appropriate strings. The compiler and RTL
should take care of any conversion between those (and do the appropriate
constant assignment when needed).
Now there should be compiler options to have the use select which type
he wants to use for the generic "String" type (all four applicable) and
which he want to use for the generic WideString Type (UCS2String and
UTF16String applicable). The generic char and WideChar type is assigned
appropriately.
Moreover this version should for all native string types provide as well
Unicode-character("code point")-counted as submode("code unit")-counted
functions and procedures for what we know as s[i], length(s), pos(),
copy(), delete(), ...
There should be compiler options (for all native string types) to have
the user select which of the two he wants use for the generic s[i],
length(s), pos(), copy(), delete(), ... notation.
With this provided, Lazarus would be able to provide whichever API for
LCL they want in a decent and highly compatible way (They _should_ allow
the user to select if he wants to link in an ANSIString, UTF8String or
UCS2String version).
Moreover this would allow for tuning a project for space of for speed
according to the platform we want to compile it for (e.g 32 Bit PC or
ARM based Cellphone).
-Michael
More information about the fpc-devel
mailing list