[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Thu Nov 27 15:44:08 CET 2014

Michael Schnell schrieb:

> I now understand that the "Element Size" field in the String header is 
> quite dummy, as under the hood there are two completely separate 
> concepts for one-byte-Strings and 2-Byte Strings and none for other 
> Element sizes.

After a code review I realized that the element size field is specific 
to dynamic strings, not present in dynamic arrays. Since the element 
size is bound to the string type, it could be omitted in the FPC 
implementation. [With little win, when the record alignment is preserved]

> This to me is not obvious at all, as the language syntax and the String 
> header data structure suggest a more universal paradigm for multiple 
> string type brands, that each have an "element-size"6 and 
> "code-ID-number" setting, handled by a common infrastructure.

This may have been envisaged by the Delphi architects, but was not 
continued later.

> The "universal paradigm" would allow for extensions (e.g. UTF-32, 
> multiple 16 Bit Code pages, an additional fully dynamic String type, 
> n-byte "un-encoded" string types), as I described in the Wiki page.

Even if feasable, such arbitrary string storage can dramatically 
increase the number of implicit string conversions. An *efficient* 
implementation would be based on a single program-wide string 
representation, with different encodings being handled only in an 
exchange with external data sources.

That standard encoding may be Ansi or Unicode; even Delphi allows for 
both models, where Ansi again suggests the use of one specific codepage 
(CP_ACP) for best performance.

<Cassandra>
After all I have the impression that the known RawByteString flaws will 
never be fixed in Delphi, in order to encourage the users to take the 
step to UnicodeString. Now the question is whether these flaws are fixed 
in FPC, or whether Lazarus will become the first project that definitely 
requires an complete move to UnicodeString, for reliable operation.
For best support of non-UTF-16 platforms I'd suggest to fix the flaws...
</Cassandra>

DoDi