[fpc-devel] String and UnicodeString and UTF8String

Hans-Peter Diettrich DrDiettrich1 at aol.com
Tue Jan 11 17:10:32 CET 2011


Marco van de Voort schrieb:

> Btw, while looking up rawbytestring I saw this in the Delphi help:
> 
> "Declaring variables or fields of type RawByteString should rarely, if ever,
> be done, because this practice can lead to undefined behavior and potential
> data loss."

IIRC RawByteString should be used like OpenString, as subroutine 
argument type only. In contrast to the name, a RawByteString has a 
variable encoding, i.e. implicit conversions are inserted for every use 
with other string types. Thus AnyByteString had been a better name for 
that type, IMO. Delphi does no more support (officially) non-textual 
data in strings, and TBytes should be used for such data.


> How will you deal with e.g. Windows? Legacy string=ansistring(0), D2009 is
> string=utf16 TUnicodestring?

Is an Delphi UnicodeString really compatible with an WinAPI 
WideString/BSTR? AFAIR all BSTRs must reside in shared memory, so that 
copies are required for every API call.


> Mainly the question what the classtree will be. The main operating type used
> in applications.  You always need two RTLs for that, since it can be 1 or 2
> byte, and even if you fixated it on one byte encodings, rawbytestring would
> force you to write case statements in each and every procedure.

UTF-8 combines an single (byte-based) storage type with lossless 
encoding of full Unicode. Ansi and UCS2 (really UTF-16) only *look* 
easier to handle in user code, but both will fail and require special 
code whenever characters outside the assumed codepage may occur.

DoDi




More information about the fpc-devel mailing list