[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Hans-Peter Diettrich DrDiettrich1 at aol.com
Wed Dec 3 00:52:45 CET 2014


Michael Schnell schrieb:
> On 11/29/2014 07:55 AM, Jonas Maebe wrote:
>> Exactly the same goes for converting strings with code page CP_NONE to 
>> a different code page: your program is broken when it tries to do that,
> 
> While accessing an array beyond its bounds is not detectable at compile 
> time and accessing an array beyond its bounds when range checking is 
> switched off is technically not detectable at runtime, and hence 
> *undefined* cant be avoided, the attempt to convert strings with code 
> page CP_NONE to a different code page is easily detectable by the 
> compiler, as we have predefined string variable type "brands" types 
> here. Thus, if the outcome is *defined* *to* *be* *undefined* it can and 
> should result in a compiler error message.

You forget that Jonas refers to *dynamic* string encodings, unknown at 
compile time. At runtime the dynamic encoding of every string is stored 
together with the string data, like the size of dynamic arrays is stored 
together with the array data.

In Delphi *no* string can have an dynamic encoding of CP_NONE or CP_ACP, 
so that nothing can be broken. In fact all CP_xxx constants are private 
in System.pas, they are not available to user or library code. 
SetCodePage (i.e. the RTL/OS function for casting AnsiString into 
UnicodeString) replace 0 (CP_ACP) by DefaultSystemCodePage before a 
conversion, and return an empty string for an unknown target codepage, 
like $FFFF (CP_NONE).

For the curious: for the exact behaviour of SetCodePage see 
MultiByteToWideChar (on Windows) and UnicodeFromLocaleChars (on POSIX), 
which finally are used to perform (the first step of) an encoding 
conversion by Delphi.
For MultiByteToWideChar see the list of allowed CP_xxx constants, as 
#defined in windows.h, how they are replaced, and what shit may happen 
to your strings when using them. The function returns 0 if it does not 
succeed; since this result is used to determine the required buffer size 
(length of the resulting string), the resulting string then is empty.

DoDi




More information about the fpc-devel mailing list