cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)
Tomas Hajny
XHajT03 at mbox.vol.cz
Wed Nov 11 23:46:00 CET 2009
On 11 Nov 09, at 20:53, Marco van de Voort wrote:
> In our previous episode, Tomas Hajny said:
> > > begin
> > > encoding:=windowsencoding2nativeencoding[encoding];
> > > end;
> > >
> > > Delphi users would only have to define the fpc constants of (2) to their
> > > respective windows codepages to keep the code working.
> >
> > Well... How do you make sure that MS doesn't come with an extension of the
> > supported codepages in the next version of MS Windows (or that they don't
> > support a different list in some special version, like a version for the
> > Chinese market) breaking your selection of "50 free values in Windows
> > range"?
>
> In that unlikely case, change the range.
That raises a question whether incompatibility between two FPC
versions is better than incompatibility between FPC and Delphi
(caused by tight connection between Delphi and one particular
platform)...
> > How does your list of "50 values" compare to 280 lines provided by (GNU)
> > recode -l (presumably matching to high extent to values supported by the
> > underlying libiconv library)?
>
> Like about 50/280. That's the point of "most used". For the less likely
> ones, define constants to the windows codepages.
I don't understand what you mean by "define constants to the windows
codepages". I guess that I'm missing something there but it seems to
me that your proposal doesn't allow use of some of the character
sets. If we want to depend on MS changing their platform specific use
of certain constants, I can imagine that we should be able to find a
gap in the windows character set numbering to cover at least all the
character sets registered by IANA. However, we need to provide
mapping between the MS Windows character set number and the native
character set number for all character set numbers defined in Windows
and supported by the particular platform, otherwise the compatibility
argument doesn't hold any longer, does it?
> Note that is all just a guestimate on the size of the free ranges. But I
> rather not expand that too much.
I'm pretty sure that Windows actually support fewer character sets
than what is defined in IANA. Since Windows already use word values,
there should be fairly large gaps. Looking at the MSDN documentation
(http://msdn.microsoft.com/en-us/library/dd317756.aspx), there are
152 values defined altogether and there's currently e.g. just a
single value used in the 3xxxx range, no value in 4xxxx, nothing
between 38 and 436 (probably rather unlikely to change, I'd expect
changes rather in other areas), nothing between 1362 and 9999, etc.
> > Isn't it necessary to also keep the character set names under Unix (as far
> > as I understand it, at least console character set information is provided
> > using charset name provided in an environment variable there)?
>
> Put them in the table too, for Unix.
>From certain perspective, these text versions may be useful for all
platforms (imagine HTML character set declarations). However, there's
a risk that they may not be used completely consistently across all
platforms (IANA definitions allow quite a few alternative versions
for the character set names). BTW, the above mentioned MSDN page also
refers to some string identifier supposedly used for .NET, so I
suspect that these become sooner or later supported by Delphi too
somehow. ;-)
> But what is the alternative? Delphi incompability ? Everything homemade and
> incompatible?
We could e.g. use the MIBENUM number defined by IANA as our primary
identifier, that is not homemade. But the main point is IMHO
understanding how these values are used (in FPC). If they're mainly
used for checking whether the string stored in memory in some
character set needs to be converted before e.g. I/O operations via
console then we may actually prefer using platform specific constants
(i.e. different values for the same character set on different
platforms) because that doesn't require any conversion (well, at
least on platforms defining console codepages using numeric values).
If we want/need to store these constants when storing strings to file
streams and make the resulting files portable across platforms then
we obviously need to use the same constants for all platforms. If we
assume need for using the same stored streams in both Delphi and FPC
programs then this needs to be compatible between Delphi and FPC (are
they compatible in other aspects?). As you can see, I'm still not
that clear on the use cases at the moment.
Tomas
More information about the fpc-devel
mailing list