[fpc-devel] Some questions and proposals about cpstring
Alex Shishkin
alexvins at mail.ru
Wed Oct 12 11:33:16 CEST 2011
1) Why UTF8String made incompatible with AnsiString(CP_UTF8)
( UTF8String = type AnsiString(CP_UTF8); )? Why not an alias?
2) Same question about RawByteString
3) why UnicodeString is separate type? Does it should be
AnsiString(CP_UTF16)? If not what is AnsiString(CP_UTF16)?
4) If now ansistring can contain text in any supported encoding, I think
that this only type is enough to support both single-byte and Delphi`s
UTF16. The only need is modeswitch to map string to UTF16-encoded
_AnsiString_ (!). There is no need to have two (or more) RTLs (for UTF8
or UTF16 f.e) because encoding info is already included in _any_
longstring.
4.1)However, for speed reasons, internally there should be separate code
for quick handle particular encodings to avoid conversions (for
[Lower|Upper]Case for example), but interface of RTL units should remain
unchanged.
5) My proposed changes to spstring.
5.1) if string is defined w/o explicit encoding (f.e. just "string", in
H+ modeswitch or "ansistring") then variable of that type can contain
value with any encoding and preserve encoding of assigned expression (no
encoding conversion). Call it "universal string". It not the same _type_
as ansistring(some_codepage).
5.2) In unicode Delphi mode encoding of all string constant values is
forced to UTF16, source encoding can be any. strings can be forced to
UTF16, but can also be "universal string"(see 5.1).
5.3) all RTL string routines params are "universal string". No need to
separate unicode versions, because the`ll be already unicode-aware.
5.4) UTF8String, RawString, UnicodeString are aliases but not unique types.
More information about the fpc-devel
mailing list