[fpc-devel] Some questions and proposals about cpstring

Alex Shishkin alexvins at mail.ru
Wed Oct 12 11:33:16 CEST 2011


1) Why UTF8String made incompatible with AnsiString(CP_UTF8)
( UTF8String = type AnsiString(CP_UTF8); )? Why not an alias?
2) Same question about RawByteString

3) why UnicodeString is separate type? Does it should be 
AnsiString(CP_UTF16)? If not what is AnsiString(CP_UTF16)?

4) If now ansistring can contain text in any supported encoding, I think 
that this only type is enough to support both single-byte and Delphi`s 
UTF16. The only need is modeswitch to map string to UTF16-encoded 
_AnsiString_ (!). There is no need to have two (or more) RTLs (for UTF8 
or UTF16 f.e) because encoding info is already included in _any_ 
longstring.
4.1)However, for speed reasons, internally there should be separate code 
for quick handle particular encodings to avoid conversions (for 
[Lower|Upper]Case for example), but interface of RTL units should remain 
unchanged.

5) My proposed changes to spstring.
5.1) if string is defined w/o explicit encoding (f.e. just "string", in 
H+ modeswitch or "ansistring") then variable of that type can contain 
value with any encoding and preserve encoding of assigned expression (no 
encoding conversion). Call it "universal string". It not the same _type_ 
as ansistring(some_codepage).
5.2) In unicode Delphi mode encoding of all string constant values is 
forced to UTF16, source encoding can be any. strings can be forced to 
UTF16, but can also be "universal string"(see 5.1).
5.3) all RTL string routines params are "universal string". No need to 
separate unicode versions, because the`ll be already unicode-aware.
5.4) UTF8String, RawString, UnicodeString are aliases but not unique types.




More information about the fpc-devel mailing list