[fpc-devel] Some questions and proposals about cpstring

Alex Shishkin alexvins at mail.ru
Wed Oct 12 14:42:42 CEST 2011


12.10.2011 16:34, Hans-Peter Diettrich пишет:
> Alex Shishkin schrieb:
>> 1) Why UTF8String made incompatible with AnsiString(CP_UTF8)
>> ( UTF8String = type AnsiString(CP_UTF8); )? Why not an alias?
>
> An alias allows to assign strings of *any* encoding, with possibly fatal
> consequences. A strict UTF8String type allows for implicit conversion,
> whenever required, so that such a string can contain nothing but UTF-8
> encoded characters.
So if I declare "MyString : AnsiString(CP_UTF8)" and assign win1251 
encoded string to it no conversion will be made? It`s strange.
>
>
>> 3) why UnicodeString is separate type? Does it should be
>> AnsiString(CP_UTF16)? If not what is AnsiString(CP_UTF16)?
>
> Delphi only allows for an element size of 1 for AnsiStrings (and
> RawByteString). The reason is unclear/undocumented. One reason may be
> the type of str[i], which is an AnsiChar for Ansi encoding, and a
> WideChar for UTF-16 encoding. Consequently a Char must have 4 bytes,
> when AnsiStrings of a variable element size would be allowed.


>
>> 4) If now ansistring can contain text in any supported encoding, I
>> think that this only type is enough to support both single-byte and
>> Delphi`s UTF16.
>
> AnsiString is only a string type with *native* (system) encoding, i.e.
> type AnsiString(0). UTF8String is already a different type
> AnsiString(CP_UTF8).
>
>> The only need is modeswitch to map string to UTF16-encoded
>> _AnsiString_ (!). There is no need to have two (or more) RTLs (for
>> UTF8 or UTF16 f.e) because encoding info is already included in _any_
>> longstring.
>
> Right, there exists no *technical* need. But a *practical* need is the
> reduction of conversions, between strings of different encodings, and
> the implementation of procedures, that work with strings (e.g. ToUpper).
>
procedures, that work with strings should use RawByteStrings, and fpc 
unlike delphi might allow UTF8 or UTF16 for RawByteStrings.

>> 4.1)However, for speed reasons, internally there should be separate
>> code for quick handle particular encodings to avoid conversions (for
>> [Lower|Upper]Case for example), but interface of RTL units should
>> remain unchanged.
>
> The fastest implementation were an UCS4 string, with no encoding
> conversions required inside any library. Then conversions are required
> only in calls of OS or other external library functions.
>
But they consume significantly more memory.




More information about the fpc-devel mailing list