[fpc-devel] String and UnicodeString and UTF8Stringt

LacaK lacak at zoznam.sk
Wed Jan 12 07:16:40 CET 2011


>
> ...: the new ansistring type has a hidden "element size" field (in 
> addition to the reference count, length and codepage), and from what I 
> can see at page 10 of 
> http://edn.embarcadero.com/article/images/38980/Delphi_and_Unicode.pdf, 
> Delphi 2009's unicodestring is simply an ansistring(1200).
So it seems, that if we will have any "GenericString", with properties 
"reference count", "size", "character width", "codepage", then all other 
string types can be based on this string type. So other strings will be 
only any "shortcuts", and internaly will use same structure:
 AnsiString = GenericString(with actual system ANSI code page (0) ... or 
... without any explicit codepage ($ffff))
 UTF8String = GenericString(with UTF-8 encoding)
 UnicodeString = GenericString(with UTF-16 encoding)

So it seems to me, that there is agreement on adding "character width", 
"codepage" to internal "string" record structure and provide conversions 
where needed, isn't it ? (more or less same approach like in Delphi)

Where is not agreement, it is fact what should be default string 
encoding (AnsiString($ffff) or UTF-8 or UTF-16 or UTF-32)

So if I revert to my original question ... is there any agreement on 
some points related to "future of String type" ?

P.S. I still does not understand, how can things work correctly if LCL 
expect that all AnsiStrings (String) are UTF8Strings, byt RTL/FCL does 
not strictly follow this (at least in Windows) ?

-Laco.



More information about the fpc-devel mailing list