[fpc-devel] String and UnicodeString and UTF8String
lacak at zoznam.sk
Tue Jan 11 10:18:50 CET 2011
> I think at most two are required for any target: unicodestring (D2009 compatibility), and if really necessary because somehow the unicodestring version causes too much overhead, an ansistring($ffff) version as well. That's only for the classes though, I think most of the base RTL can be simply ansistring($ffff).
So if I understand correctly, then UnicodeString and also AnsiString
types must "be extended" that they will hold also information about
actual codepage (encoding) of string data they hold.
(AFAIK ATM they hold only information about "reference count" and "size"
and of course "data")
I am not expert, so I do not understand all aspect/problems which are
joined with proper string handling, but some kind of implicit
conversions (based on actual encoding of string data) is necessary (ANSI
<-> UTF-8 <-> UTF-16 <-> ANSI ... etc.).
For example known problem with Euro currency symbol. In Windows is in
CurrencyString global variable stored using ANSI codepage, but used in
LCL (which expect UTF-8 encoding) without any explicit conversion, what
leads to displayng "?" instead of "€" (for example in TDBEdit or TDBGrid)
Another problem when displaying character data in data-aware database
controls (TDBEdit, TDBGrid). Data-aware controls (LCL) reads data from
TField descendatns (FCL) using TField.Text property which returns
"string" (without codepage information is not clear if it is AnsiString
or UTF8String or UnicodeString). LCL expect UTF-8 strings, but it is not
true in all cases (for example in case of ODBC)
More information about the fpc-devel