[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Thu Nov 27 11:24:55 CET 2014

On 11/26/2014 06:37 PM, Hans-Peter Diettrich wrote:
>
> An AnsiString consists of AnsiChar's. The *meaning* of these char's 
> (bytes) depends on their encoding, regardless of whether the used 
> encoding is or is not stored with the string.
I understand that the implementation (in Delphi) seems to be driven more 
by the Wording ("ANSI") than by the logical paradigm the language syntax 
suggests. The language syntax and the string header fields suggest that 
both the element-size as the code-ID-number need to be adhered to (be it 
statically or dynamically - depending on the usage instance). E.g. there 
are (are least two "Code pages" for UTF-16 ("LE", and "BE"), that would 
be worth supporting.
>
> It's essential to distinguish between low-level (physical) AnsiChar 
> values, and *logical* characters possibly consisting of multiple 
> AnsiChars.
I now do see that the implementation is done following this concept. But 
the language syntax and the string header field suggest a more versatile 
paradigm, providing a universal reference counting "element string" type.
>
> That's why I wonder *when* exactly the result of such an expression 
> *is* converted (implicitly) into the static encoding of the target 
> variable, and when *not*.
I understand that the idea is, to use the static encoding information 
provided by the type definition whenever possible. I understand that if 
no RawByteString is involved in the operation, the static encoding 
information is sufficient and hence the potential calls to the dedicated 
conversion library functions can completely be constructed at compile time.

In Delphi the use of the dynamic encoding information seems to be very 
rare (and the implementation does not make much sense to me).

> The entire mess results from the bad interpretation of RawByteString 
> assignments, which IMO was well thought by the Delphi language 
> architects, but not understood by the Delphi compiler coders. 

I fully agree with you.

I suppose the original idea was to create an (additional) fully dynamic 
type brand, for that whenever used, the compiler needs to read the 
dynamic encoding information (both element-size and encoding-ID-number) 
and act appropriately. With that decently implemented, in fact, TStrings 
and similar classes could use this type for universal handling of all 
String type brands.

My hope was, that fpc might be able to correct this error of the Delphi 
compiler coders. But of course for Delphi compatibility the type name 
RawByteString and the code-ID-number  $FFFF can't be used any more, but 
a new naming and ID number would need to be invented. IMHO this in fact 
is possible and viable (see wiki page for details).

-Michael