[fpc-devel] Trying to understand the wiki-Page "FPC Unicode support"

Hans-Peter Diettrich DrDiettrich1 at aol.com
Sat Nov 29 17:36:16 CET 2014


Jonas Maebe schrieb:
> On 28/11/14 21:30, Hans-Peter Diettrich wrote:
>> I prefer to specify and document everything *before* coding, so that
>> everybody can expect that the code will behave as specified.
> 
> If certain behaviour is explicitly undefined, it *is* specified and
> documented. It means that your program is buggy if it triggers such
> behaviour, and that the effect of triggering it could be anything.
[...]
> An example from FPC itself is accessing an array beyond its bounds when
> range checking is switched off.

After this hint I reviewd the "Code page identifiers" section again, and 
probably could find the source of misunderstandings.
 >>
CP_NONE: this value indicates that no code page information has been 
associated with the string data. The result of any explicit or implicit 
operation that converts this data to another code page is undefined.
<<
Does this mean "CP_NONE is not an allowed *dynamic* (string *data*) 
encoding", just like any other undefined encoding value?

In this case the description is correct, but it describes an special 
case of some *undefined* general rule, about valid and invalid dynamic 
encodings in general. Then this general rule should be documented 
before, not only for CP_NONE. Then also documentation of the *intended* 
purpose of CP_NONE, for the *static* encoding of the RawByteString type, 
is missing at all.

As Delphi doesn't allow for a dynamic encoding of CP_NONE, I don't 
understand the purpose of the FPC description. Now in turn some FPC 
developer might have misunderstood the (Delphi) handling of 
RawByteStrings, assuming that it were okay to omit a conversion in an 
assignment of RawByteString to an AnsiString of a different encoding.

That's why I think that the incorrect handling of such RawByteString 
assignments in FPC should be fixed, according to the general rule of 
assignments to an string of a different (static) encoding. CP_NONE 
definitely *is* different from any other encoding, and Delphi does not 
define an exception for RawByteStrings.


> Exactly the same goes for converting strings with code page CP_NONE to a
> different code page: your program is broken when it tries to do that,
> and we cannot guarantee any outcome. This is exactly what "the behaviour
> is undefined" means.

When a string *really* has a *dynamic* encoding of CP_NONE, this of 
course is illegal and thus will result in an undefined result. ACK, so 
far. But since Delphi (quietly) changes an SetCodePage to CP_NONE into 
the current CP_ACP, the undefined situation (invalid dynamic encoding) 
must have been forced by some illegal *hack* before, or in the FPC case 
by some erroneous (not Delphi conforming) RTL code.

DoDi




More information about the fpc-devel mailing list