[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"
mschnell at lumino.de
Wed Nov 26 11:23:17 CET 2014
I fail to understand some of the text.
It seems to be unavoidable to use the name "ANSIString" even though I
always though up when seeing a thing called "ANSI" containing Unicode
(e. g. "UTF8String = type AnsiString(CP_UTF8)" ).
Seemingly here the "bytes per character" setting implicitly is thought
of as a port of the "code-page" definition. correct ?
In section "Dynamic code page":
"When assigning a string to a plain AnsiString (= AnsiString(CP_ACP)) or
ShortString, the string data will however be converted to
DefaultSystemCodePage. The dynamic code page of that AnsiString(CP_ACP)
will then be the current value of DefaultSystemCodePage (e.g. 1250 for
the Windows-1250 code page), even though its static code page is CP_ACP
(which is a constant <> 1250). This is one example of how the static
code page can differ from the dynamic code page. Subsequent sections
will describe more such scenarios."
1) A short String does not have a Code page notification so for this
"static code page can differ from the dynamic code page" does not seem
to make much sense.
2) I fail to understand how with this explanation that seems to force
auto conversion for assignments between types with different "code page"
settings (also for CP_ACP) the "static code page can differ from the
dynamic code page" can happen.
In fact this disaster seems to be able to happen (see section
"RawByteString") if assigning a string with a static code page X1 to a
RawByteString (hence no conversion) and then assigning that
RawByteString to a string with a static code page X2 (no conversion
again). In fact I assume that without abusing RawByteString such
"intersexual" strings can't be produced, otherwise this would be rather
disastrous for normal users.
In section "RawByteString":
"the results of conversions from/to the CP_NONE code page are undefined."
In effect the behavior is exactly defined in this section "As a first
approximation". Does that mean it is due to be changed ? Is there a
cause why not keep the described behavior (just don't any conversion
ever). Of course this can produce intersexual strings. Is this great
harm ? If yes I think assigning a RawByteString to a string with a
static code page should be completely forbidden at compile time or
result in a runtime error if the code page does not match.
More information about the fpc-devel