[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
Michael Schnell
mschnell at lumino.de
Thu Nov 27 11:24:55 CET 2014
On 11/26/2014 06:37 PM, Hans-Peter Diettrich wrote:
>
> An AnsiString consists of AnsiChar's. The *meaning* of these char's
> (bytes) depends on their encoding, regardless of whether the used
> encoding is or is not stored with the string.
I understand that the implementation (in Delphi) seems to be driven more
by the Wording ("ANSI") than by the logical paradigm the language syntax
suggests. The language syntax and the string header fields suggest that
both the element-size as the code-ID-number need to be adhered to (be it
statically or dynamically - depending on the usage instance). E.g. there
are (are least two "Code pages" for UTF-16 ("LE", and "BE"), that would
be worth supporting.
>
> It's essential to distinguish between low-level (physical) AnsiChar
> values, and *logical* characters possibly consisting of multiple
> AnsiChars.
I now do see that the implementation is done following this concept. But
the language syntax and the string header field suggest a more versatile
paradigm, providing a universal reference counting "element string" type.
>
> That's why I wonder *when* exactly the result of such an expression
> *is* converted (implicitly) into the static encoding of the target
> variable, and when *not*.
I understand that the idea is, to use the static encoding information
provided by the type definition whenever possible. I understand that if
no RawByteString is involved in the operation, the static encoding
information is sufficient and hence the potential calls to the dedicated
conversion library functions can completely be constructed at compile time.
In Delphi the use of the dynamic encoding information seems to be very
rare (and the implementation does not make much sense to me).
> The entire mess results from the bad interpretation of RawByteString
> assignments, which IMO was well thought by the Delphi language
> architects, but not understood by the Delphi compiler coders.
I fully agree with you.
I suppose the original idea was to create an (additional) fully dynamic
type brand, for that whenever used, the compiler needs to read the
dynamic encoding information (both element-size and encoding-ID-number)
and act appropriately. With that decently implemented, in fact, TStrings
and similar classes could use this type for universal handling of all
String type brands.
My hope was, that fpc might be able to correct this error of the Delphi
compiler coders. But of course for Delphi compatibility the type name
RawByteString and the code-ID-number $FFFF can't be used any more, but
a new naming and ID number would need to be invented. IMHO this in fact
is possible and viable (see wiki page for details).
-Michael
More information about the fpc-devel
mailing list