cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)

Tomas Hajny XHajT03 at mbox.vol.cz
Wed Nov 11 23:46:00 CET 2009


On 11 Nov 09, at 20:53, Marco van de Voort wrote:
> In our previous episode, Tomas Hajny said:
> > >    begin
> > >      encoding:=windowsencoding2nativeencoding[encoding];
> > >    end;
> > >
> > > Delphi users would only have to define the fpc constants of (2) to their
> > > respective windows codepages to keep the code working.
> > 
> > Well... How do you make sure that MS doesn't come with an extension of the
> > supported codepages in the next version of MS Windows (or that they don't
> > support a different list in some special version, like a version for the
> > Chinese market) breaking your selection of "50 free values in Windows
> > range"?
> 
> In that unlikely case, change the range.

That raises a question whether incompatibility between two FPC 
versions is better than incompatibility between FPC and Delphi 
(caused by tight connection between Delphi and one particular 
platform)...


> > How does your list of "50 values" compare to 280 lines provided by (GNU)
> > recode -l (presumably matching to high extent to values supported by the
> > underlying libiconv library)? 
> 
> Like about 50/280. That's the point of "most used". For the less likely
> ones, define constants to the windows codepages.

I don't understand what you mean by "define constants to the windows 
codepages". I guess that I'm missing something there but it seems to 
me that your proposal doesn't allow use of some of the character 
sets. If we want to depend on MS changing their platform specific use 
of certain constants, I can imagine that we should be able to find a 
gap in the windows character set numbering to cover at least all the 
character sets registered by IANA. However, we need to provide 
mapping between the MS Windows character set number and the native 
character set number for all character set numbers defined in Windows 
and supported by the particular platform, otherwise the compatibility 
argument doesn't hold any longer, does it?


> Note that is all just a guestimate on the size of the free ranges. But I
> rather not expand that too much.

I'm pretty sure that Windows actually support fewer character sets 
than what is defined in IANA. Since Windows already use word values, 
there should be fairly large gaps. Looking at the MSDN documentation 
(http://msdn.microsoft.com/en-us/library/dd317756.aspx), there are 
152 values defined altogether and there's currently e.g. just a 
single value used in the 3xxxx range, no value in 4xxxx, nothing 
between 38 and 436 (probably rather unlikely to change, I'd expect 
changes rather in other areas), nothing between 1362 and 9999, etc.


> > Isn't it necessary to also keep the character set names under Unix (as far
> > as I understand it, at least console character set information is provided
> > using charset name provided in an environment variable there)?
> 
> Put them in the table too, for Unix.

>From certain perspective, these text versions may be useful for all 
platforms (imagine HTML character set declarations). However, there's 
a risk that they may not be used completely consistently across all 
platforms (IANA definitions allow quite a few alternative versions 
for the character set names). BTW, the above mentioned MSDN page also 
refers to some string identifier supposedly used for .NET, so I 
suspect that these become sooner or later supported by Delphi too 
somehow. ;-)


> But what is the alternative? Delphi incompability ? Everything homemade and
> incompatible?  

We could e.g. use the MIBENUM number defined by IANA as our primary 
identifier, that is not homemade. But the main point is IMHO 
understanding how these values are used (in FPC). If they're mainly 
used for checking whether the string stored in memory in some 
character set needs to be converted before e.g. I/O operations via 
console then we may actually prefer using platform specific constants 
(i.e. different values for the same character set on different 
platforms) because that doesn't require any conversion (well, at 
least on platforms defining console codepages using numeric values). 
If we want/need to store these constants when storing strings to file 
streams and make the resulting files portable across platforms then 
we obviously need to use the same constants for all platforms. If we 
assume need for using the same stored streams in both Delphi and FPC 
programs then this needs to be compatible between Delphi and FPC (are 
they compatible in other aspects?). As you can see, I'm still not 
that clear on the use cases at the moment.

Tomas




More information about the fpc-devel mailing list