[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
DrDiettrich1 at aol.com
Thu Nov 27 15:44:08 CET 2014
Michael Schnell schrieb:
> I now understand that the "Element Size" field in the String header is
> quite dummy, as under the hood there are two completely separate
> concepts for one-byte-Strings and 2-Byte Strings and none for other
> Element sizes.
After a code review I realized that the element size field is specific
to dynamic strings, not present in dynamic arrays. Since the element
size is bound to the string type, it could be omitted in the FPC
implementation. [With little win, when the record alignment is preserved]
> This to me is not obvious at all, as the language syntax and the String
> header data structure suggest a more universal paradigm for multiple
> string type brands, that each have an "element-size"6 and
> "code-ID-number" setting, handled by a common infrastructure.
This may have been envisaged by the Delphi architects, but was not
> The "universal paradigm" would allow for extensions (e.g. UTF-32,
> multiple 16 Bit Code pages, an additional fully dynamic String type,
> n-byte "un-encoded" string types), as I described in the Wiki page.
Even if feasable, such arbitrary string storage can dramatically
increase the number of implicit string conversions. An *efficient*
implementation would be based on a single program-wide string
representation, with different encodings being handled only in an
exchange with external data sources.
That standard encoding may be Ansi or Unicode; even Delphi allows for
both models, where Ansi again suggests the use of one specific codepage
(CP_ACP) for best performance.
After all I have the impression that the known RawByteString flaws will
never be fixed in Delphi, in order to encourage the users to take the
step to UnicodeString. Now the question is whether these flaws are fixed
in FPC, or whether Lazarus will become the first project that definitely
requires an complete move to UnicodeString, for reliable operation.
For best support of non-UTF-16 platforms I'd suggest to fix the flaws...
More information about the fpc-devel