[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Hans-Peter Diettrich DrDiettrich1 at aol.com
Wed Nov 26 23:41:41 CET 2014


Jonas Maebe schrieb:

> Technically, that section literally states that they will be
> concatenated without data loss and that the result is then converted to
> the target string's encoding (except in case the target is
> RawByteString). How that is implemented exactly is undefined; again in
> the meaning of "undefined", not in the meaning of "undefined when
> defined as meaning X".

In this case the implementation is "compiler specific", somewhat 
different from "undefined" (in a RawByteString):
"CP_NONE: this value indicates that no code page information has been 
associated with the string data. The result of any explicit or implicit 
operation that converts this data to another code page is undefined."

IMO the result is well defined: it's the string with the encoding of 
that "other" codepage. An "undefined" result, as I understand it, would 
mean "the result can be anything, unrelated to the function input".

The branch taken in execution of an IF statement also is not 
"undefined", only because it depends on the actual condition value.

The value of a local variable initially is "undefined", i.e. can be any 
value. But after an assignment it *is* defined, even if that value still 
may be *unpredictable* by static code analysis.

IMO a better wording should be found, that does not cause the current 
obvious confusion of some readers.


>> Regarding RawByteStrings there has been the definition "a RawByteString
>> has exactly the same behavior as assigning that AnsiString(X) to another
>> AnsiString(X) variable with the same value of X: no code page conversion
>> or copying occurs". Seemingly this is not true for the intermediate
>> results of concatenations.
> 
> That paragraph only specifies that code page-aware strings are
> concatenated without data loss, and then defines to which code page the
> result will be converted before assigning it to the target.

What's the meaning of "no copying occurs"? Of course the reference to 
the string is copied into the target variable!

What's "the same value of X", in case of AnsiString(CP_ACP) and 
AnsiString(DefaultSystemCodePage)?


> Even if the intermediary result of a concatenation would be a
> RawByteString (which is not stated nor necessarily ever the case), then
> the above would apply and hence the (dynamic) code page of that
> RawByteString would be the one as defined by the above-mentioned rules
> before it would be assigned to the target.

Please note that the other statements refer to *static* encodings, 
therefore my question about the (assumed) static encoding of an 
intermediate result. When the compiler inserts an conversion request 
based on *static* encodings, will it or will it not insert such an 
request, before an intermediate result is assigned to the target variable?


Suggestion:

"During string operations the source strings are converted [to CP_ACP?] 
when they have a different [dynamic?] encoding. When the result is 
stored in a variable, it is converted as required by the static encoding 
of the target."

Where "as required" means that a static target encoding of CP_ACP is 
replaced by the DefaultSystemCodePage, while CP_NONE does not require a 
conversion.

The CP_ACP case should be clarified as well, because it's unclear 
whether CP_ACP(=0) is *considered* equal to the current 
DefaultSystemCodePage, even if both values are *always* different (see 
above). The use of "CP_ACP" instead of "DefaultSystemCodePage" can be 
confusing and should be avoided or clarified before.

Perhaps it would help to concentrate on the following steps:
1) (string) operand fetch
2) (string) operations
3) (string) assignment

1) Fetching an operand removes any information about the static encoding 
of the source, only its dynamic encoding persists.
[Now the handling of non-AnsiString sources can be explained, like for 
literals, ShortString etc.
RawByteString is not special here, it's only a static encoding.
]

2) String operations take into account the dynamic encoding of their 
operands, with lossless conversions inserted as required.

3) When a string is assigned to a variable, it is eventually converted 
as required by the static encoding of the target, with possible data loss.
[about "required" see above.
Special case: when the source is a variable, no conversion occurs when 
the *static* source and target types are "compatible".
What exactly is compatible with CP_ACP?
]

DoDi




More information about the fpc-devel mailing list