[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
Hans-Peter Diettrich
DrDiettrich1 at aol.com
Wed Nov 26 23:41:41 CET 2014
Jonas Maebe schrieb:
> Technically, that section literally states that they will be
> concatenated without data loss and that the result is then converted to
> the target string's encoding (except in case the target is
> RawByteString). How that is implemented exactly is undefined; again in
> the meaning of "undefined", not in the meaning of "undefined when
> defined as meaning X".
In this case the implementation is "compiler specific", somewhat
different from "undefined" (in a RawByteString):
"CP_NONE: this value indicates that no code page information has been
associated with the string data. The result of any explicit or implicit
operation that converts this data to another code page is undefined."
IMO the result is well defined: it's the string with the encoding of
that "other" codepage. An "undefined" result, as I understand it, would
mean "the result can be anything, unrelated to the function input".
The branch taken in execution of an IF statement also is not
"undefined", only because it depends on the actual condition value.
The value of a local variable initially is "undefined", i.e. can be any
value. But after an assignment it *is* defined, even if that value still
may be *unpredictable* by static code analysis.
IMO a better wording should be found, that does not cause the current
obvious confusion of some readers.
>> Regarding RawByteStrings there has been the definition "a RawByteString
>> has exactly the same behavior as assigning that AnsiString(X) to another
>> AnsiString(X) variable with the same value of X: no code page conversion
>> or copying occurs". Seemingly this is not true for the intermediate
>> results of concatenations.
>
> That paragraph only specifies that code page-aware strings are
> concatenated without data loss, and then defines to which code page the
> result will be converted before assigning it to the target.
What's the meaning of "no copying occurs"? Of course the reference to
the string is copied into the target variable!
What's "the same value of X", in case of AnsiString(CP_ACP) and
AnsiString(DefaultSystemCodePage)?
> Even if the intermediary result of a concatenation would be a
> RawByteString (which is not stated nor necessarily ever the case), then
> the above would apply and hence the (dynamic) code page of that
> RawByteString would be the one as defined by the above-mentioned rules
> before it would be assigned to the target.
Please note that the other statements refer to *static* encodings,
therefore my question about the (assumed) static encoding of an
intermediate result. When the compiler inserts an conversion request
based on *static* encodings, will it or will it not insert such an
request, before an intermediate result is assigned to the target variable?
Suggestion:
"During string operations the source strings are converted [to CP_ACP?]
when they have a different [dynamic?] encoding. When the result is
stored in a variable, it is converted as required by the static encoding
of the target."
Where "as required" means that a static target encoding of CP_ACP is
replaced by the DefaultSystemCodePage, while CP_NONE does not require a
conversion.
The CP_ACP case should be clarified as well, because it's unclear
whether CP_ACP(=0) is *considered* equal to the current
DefaultSystemCodePage, even if both values are *always* different (see
above). The use of "CP_ACP" instead of "DefaultSystemCodePage" can be
confusing and should be avoided or clarified before.
Perhaps it would help to concentrate on the following steps:
1) (string) operand fetch
2) (string) operations
3) (string) assignment
1) Fetching an operand removes any information about the static encoding
of the source, only its dynamic encoding persists.
[Now the handling of non-AnsiString sources can be explained, like for
literals, ShortString etc.
RawByteString is not special here, it's only a static encoding.
]
2) String operations take into account the dynamic encoding of their
operands, with lossless conversions inserted as required.
3) When a string is assigned to a variable, it is eventually converted
as required by the static encoding of the target, with possible data loss.
[about "required" see above.
Special case: when the source is a variable, no conversion occurs when
the *static* source and target types are "compatible".
What exactly is compatible with CP_ACP?
]
DoDi
More information about the fpc-devel
mailing list