cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)

Marco van de Voort marcov at stack.nl
Thu Nov 12 08:56:52 CET 2009


In our previous episode, Tomas Hajny said:
> > > supported codepages in the next version of MS Windows (or that they don't
> > > support a different list in some special version, like a version for the
> > > Chinese market) breaking your selection of "50 free values in Windows
> > > range"?
> > 
> > In that unlikely case, change the range.
> 
> That raises a question whether incompatibility between two FPC 
> versions

Incompatibility how exactly? Two different FPC versions are already not
compatible.

> is better than incompatibility between FPC and Delphi (caused by tight
> connection between Delphi and one particular platform)...
 
That would be source incompatibility, and therefore much worse.
 
> > Like about 50/280. That's the point of "most used". For the less likely
> > ones, define constants to the windows codepages.
> 
> I don't understand what you mean by "define constants to the windows 
> codepages".

The 16-bit range is split between a short FPC range and a long
Delphi/Windows range. Rarely used codepages use the windows codepage number,
and if foreign OSes support that, they must implement a windows2local
codepage number conversion.

> of certain constants, I can imagine that we should be able to find a 
> gap in the windows character set numbering to cover at least all the 
> character sets registered by IANA.

Implementing at all only makes sense if OSes implement them exactly. Several
Windows codepages might map to corresponding IANA sets.

> However, we need to provide mapping between the MS Windows character set
> number and the native character set number for all character set numbers
> defined in Windows and supported by the particular platform, otherwise the
> compatibility argument doesn't hold any longer, does it?

Just like that you must be able to map the IANA sets to actually supported
sets on all platforms.
 
> > Note that is all just a guestimate on the size of the free ranges. But I
> > rather not expand that too much.
> 
> I'm pretty sure that Windows actually support fewer character sets 
> than what is defined in IANA. Since Windows already use word values, 
> there should be fairly large gaps. Looking at the MSDN documentation 
> (http://msdn.microsoft.com/en-us/library/dd317756.aspx), there are 
> 152 values defined altogether and there's currently e.g. just a 
> single value used in the 3xxxx range, no value in 4xxxx, nothing 
> between 38 and 436 (probably rather unlikely to change, I'd expect 
> changes rather in other areas), nothing between 1362 and 9999, etc.

If the ranges are large enough we can try to fit them in all somewhere. But
this means the lesser used codepages are also in twice, blowing up
lookuptables or codepages. 
 
> > > as I understand it, at least console character set information is provided
> > > using charset name provided in an environment variable there)?
> > 
> > Put them in the table too, for Unix.
> 
> >From certain perspective, these text versions may be useful for all 
> platforms (imagine HTML character set declarations).

> However, there's a risk that they may not be used completely consistently
> across all platforms (IANA definitions allow quite a few alternative
> versions for the character set names). BTW, the above mentioned MSDN page
> also refers to some string identifier supposedly used for .NET, so I
> suspect that these become sooner or later supported by Delphi too somehow.
> ;-)

I'd wait till this is entirely sure before exposing these names, and only on
platforms that need them. Otherwise we find ourselves with 3 strings per
codepage on all platforms before long in any library.

Moreover, many OSes might already provide a way to resolve numbers to names.
 
> > But what is the alternative? Delphi incompability ? Everything homemade and
> > incompatible?  
> 
> We could e.g. use the MIBENUM number defined by IANA as our primary 
> identifier, that is not homemade. But the main point is IMHO 
> understanding how these values are used (in FPC). If they're mainly 
> used for checking whether the string stored in memory in some 
> character set needs to be converted before e.g. I/O operations via 
> console then we may actually prefer using platform specific constants 
> (i.e. different values for the same character set on different 
> platforms) because that doesn't require any conversion (well, at 
> least on platforms defining console codepages using numeric values). 
> If we want/need to store these constants when storing strings to file 
> streams and make the resulting files portable across platforms then 
> we obviously need to use the same constants for all platforms. If we 
> assume need for using the same stored streams in both Delphi and FPC 
> programs then this needs to be compatible between Delphi and FPC (are 
> they compatible in other aspects?). As you can see, I'm still not 
> that clear on the use cases at the moment.

It would greatly confuse FPC-Delphi projects for a nearly sterile benefit.
The problem is not even the change itself, but actually hunting them down,
ifdefing them, getting the changes accepted etc.



More information about the fpc-devel mailing list