[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Michael Schnell mschnell at lumino.de
Wed Nov 26 11:23:17 CET 2014


I fail to understand some of the text.

It seems to be unavoidable to use the name "ANSIString" even though I 
always though up when seeing a thing called "ANSI" containing Unicode 
(e. g.   "UTF8String = type AnsiString(CP_UTF8)" ).


Seemingly here the "bytes per character" setting implicitly is thought 
of as a port of the "code-page" definition. correct ?


In section "Dynamic code page":

"When assigning a string to a plain AnsiString (= AnsiString(CP_ACP)) or 
ShortString, the string data will however be converted to 
DefaultSystemCodePage. The dynamic code page of that AnsiString(CP_ACP) 
will then be the current value of DefaultSystemCodePage (e.g. 1250 for 
the Windows-1250 code page), even though its static code page is CP_ACP 
(which is a constant <> 1250). This is one example of how the static 
code page can differ from the dynamic code page. Subsequent sections 
will describe more such scenarios."

1) A short String does not have a Code page notification so for this 
"static code page can differ from the dynamic code page" does not seem 
to make much sense.

2) I fail to understand how with this explanation that seems to force 
auto conversion for assignments between types with different "code page" 
settings (also for CP_ACP) the "static code page can differ from the 
dynamic code page" can happen.

In fact this disaster seems to be able to happen (see section 
"RawByteString") if assigning a string with a static code page X1 to a 
RawByteString (hence no conversion) and then assigning that 
RawByteString to a string with a static code page X2 (no conversion 
again). In fact I assume that without abusing RawByteString such 
"intersexual" strings can't be produced, otherwise this would be rather 
disastrous for normal users.



In section "RawByteString":

"the results of conversions from/to the CP_NONE code page are undefined."

In effect the behavior is exactly defined in this section "As a first 
approximation". Does that mean it is due to be changed ? Is there a 
cause why not keep the described behavior (just don't any conversion 
ever). Of course this can produce intersexual strings. Is this great 
harm ? If yes I think assigning a RawByteString to a string with a 
static code page should be completely forbidden at compile time or 
result in a runtime error if the code page does not match.

-Michael



More information about the fpc-devel mailing list