[fpc-devel] Some questions and proposals about cpstring
    Hans-Peter Diettrich 
    DrDiettrich1 at aol.com
       
    Wed Oct 12 14:34:29 CEST 2011
    
    
  
Alex Shishkin schrieb:
> 1) Why UTF8String made incompatible with AnsiString(CP_UTF8)
> ( UTF8String = type AnsiString(CP_UTF8); )? Why not an alias?
An alias allows to assign strings of *any* encoding, with possibly fatal 
consequences. A strict UTF8String type allows for implicit conversion, 
whenever required, so that such a string can contain nothing but UTF-8 
encoded characters.
> 2) Same question about RawByteString
Variables of type RawByteString have no fixed encoding. Any AnsiString 
can be assigned to an RawByteString variable, without conversion. But 
when a RawByteString is assigned to an different string type, an 
conversion may be necessary.
> 3) why UnicodeString is separate type? Does it should be 
> AnsiString(CP_UTF16)? If not what is AnsiString(CP_UTF16)?
Delphi only allows for an element size of 1 for AnsiStrings (and 
RawByteString). The reason is unclear/undocumented. One reason may be 
the type of str[i], which is an AnsiChar for Ansi encoding, and a 
WideChar for UTF-16 encoding. Consequently a Char must have 4 bytes, 
when AnsiStrings of a variable element size would be allowed.
> 4) If now ansistring can contain text in any supported encoding, I think 
> that this only type is enough to support both single-byte and Delphi`s 
> UTF16.
AnsiString is only a string type with *native* (system) encoding, i.e. 
type AnsiString(0). UTF8String is already a different type 
AnsiString(CP_UTF8).
> The only need is modeswitch to map string to UTF16-encoded 
> _AnsiString_ (!). There is no need to have two (or more) RTLs (for UTF8 
> or UTF16 f.e) because encoding info is already included in _any_ 
> longstring.
Right, there exists no *technical* need. But a *practical* need is the 
reduction of conversions, between strings of different encodings, and 
the implementation of procedures, that work with strings (e.g. ToUpper).
> 4.1)However, for speed reasons, internally there should be separate code 
> for quick handle particular encodings to avoid conversions (for 
> [Lower|Upper]Case for example), but interface of RTL units should remain 
> unchanged.
The fastest implementation were an UCS4 string, with no encoding 
conversions required inside any library. Then conversions are required 
only in calls of OS or other external library functions.
DoDi
    
    
More information about the fpc-devel
mailing list