[fpc-devel] String and UnicodeString and UTF8String

LacaK lacak at zoznam.sk
Tue Jan 11 10:18:50 CET 2011


> I think at most two are required for any target: unicodestring (D2009 compatibility), and if really necessary because somehow the unicodestring version causes too much overhead, an ansistring($ffff) version as well. That's only for the classes though, I think most of the base RTL can be simply ansistring($ffff).
>   
So if I understand correctly, then UnicodeString and also AnsiString 
types must "be extended" that they will hold also information about 
actual codepage (encoding) of string data they hold.
(AFAIK ATM they hold only information about "reference count" and "size" 
and of course "data")

I am not expert, so I do not understand all aspect/problems which are 
joined with proper string handling, but some kind of implicit 
conversions (based on actual encoding of string data) is necessary (ANSI 
<-> UTF-8 <-> UTF-16 <-> ANSI ... etc.).

For example known problem with Euro currency symbol. In Windows is in 
CurrencyString global variable stored using ANSI codepage, but used in 
LCL (which expect UTF-8 encoding) without any explicit conversion, what 
leads to displayng "?" instead of "€" (for example in TDBEdit or TDBGrid)

Another problem when displaying character data in data-aware database 
controls (TDBEdit, TDBGrid). Data-aware controls (LCL) reads data from 
TField descendatns (FCL) using TField.Text property which returns 
"string" (without codepage information is not clear if it is AnsiString 
or UTF8String or UnicodeString). LCL expect UTF-8 strings, but it is not 
true in all cases (for example in case of ODBC)

-Laco.



More information about the fpc-devel mailing list