cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)

Marco van de Voort marcov at stack.nl
Thu Nov 12 14:19:34 CET 2009

In our previous episode, Tomas Hajny said:
> > Incompatibility how exactly? Two different FPC versions are already not
> > compatible.
> If you need to change the used range between e.g. FPC 2.6.x and 2.8.x (due
> to MS extending their use of the codepage values into the range we decided
> to use in FPC), this makes 2.6.x and 2.8.x incompatible to each other,
> right?

I don't see how. Not if the symbolic constants are used, which is the whole
point. Note that this is already a slim chance. Specially if the ranges are
as big as you say.
> >> is better than incompatibility between FPC and Delphi (caused by tight
> >> connection between Delphi and one particular platform)...
> >
> > That would be source incompatibility, and therefore much worse.
> First, this may be the case for compatibility between two FPC versions
> too.

Not if we specify from the start that the RTL predefined symbolic constants
are the only supported values, and that the windows codepages _might_ work
depending on platform. (e.g. to avoid having too much overhead on embedded
targets). IOW numeric values are undefined, but might map to windows
codepages if the platform supports it.

We have that luxury. The values in Delphi code with windows codepages are
already out there, and we have no real power to change that.

Sure, you can ultimately convince Delphi open source projects to use FPC
valeus and define them for themselves, but is hard, and a problem that you
face with each new piece of Delphi code. Again and again.

> Second, the relation between the numeric values appearing in FPC sources
> and how the compiler translates the sources to the internal representation
> in memory (which is possibly only valid for the particular platform) is
> something that may not be the same (depending on the use cases, of
> course).

Yes, you could lay a xlat layer within the parser. (source number to unicode
encoding word mapping)
That will break much less code. (only Delphi assembler code), but pulls the
lot into the compiler. 

Not desirable IMHO, and moreover, I'm not convinced there really is a
problem that warrants such draconian measures in the first place.

> > and if foreign OSes support that, they must implement a windows2local
> > codepage number conversion.
> As far as I'm concerned, I'm fine with providing a translation table
> between Windows codepages and individual platforms (e.g. OS/2), but I'm
> less comfortable with having to use this translation at runtime under all
> platforms except for Windows and I'm somewhat worried about not having a
> solution for supporting character set which may be used e.g. for console
> on non-windows platforms but are not supported by Windows (have a look at
> the URL sent by Jonas yesterday for Mac OS X; without having performed
> complete comparison, it seemed to contain some character sets not listed
> on the MSDN page for Windows).

The lookup only happens at the iconv moment, which is magnitudes more
expensive. The example to windows code pages was a bit windows centric, but
was only an example. The word to <whatever the encoding procedure uses>
transformation is platform dependant.

In the windows case this means a lookup has to be inserted to handle the FPC
predefined ones. For other platforms a lookup has to be inserted no matter

> >> of certain constants, I can imagine that we should be able to find a
> >> gap in the windows character set numbering to cover at least all the
> >> character sets registered by IANA.
> >
> > Implementing at all only makes sense if OSes implement them exactly.
> > Several Windows codepages might map to corresponding IANA sets.
> Do you have some examples of this case?

I never brought up IANA :-) The point is while IANA might be a standard, the
APIs probably don't use IANA numbers as 
> >> compatibility argument doesn't hold any longer, does it?
> >
> > Just like that you must be able to map the IANA sets to actually supported
> > sets on all platforms.
> Yes, absolutely. The only potential advantage of IANA numbers would be
> ensured compatibility across future FPC versions without risk that we need
> to "remap" the codepage numbers in the future due to MS or some other
> vendor changing use of their platform specific constants. I don't say that
> this is a must or necessarily the best option, just an option we may want
> to consider depending on the use cases (see below).

If you guarantee numeric compatibility for the FPC side. Something I don't
plan to do. Only symbolic.  (FPC_IANA1_English or whatever. Not the
corresponding numeric value)

> > If the ranges are large enough we can try to fit them in all somewhere.
> > But this means the lesser used codepages are also in twice, blowing up
> > lookuptables or codepages.
> Yes. Either at compile time (where it makes no difference at all), or
> possibly also at runtime where this means something like 1600 bytes on
> 32-bit platforms (assuming 200 records with 2 fields of 4 bytes each).
And more if the target codepages identifier is a strings yes, less if you
can build the table in code. But if 2 bytes are enough for a certain target,
you do that.

But more importantly,  I can't think of any other way to prevent
this, since you won't find any sequence of numbers that you can pass to all
OSes apis without translation.

> > I'd wait till this is entirely sure before exposing these names, and only
> > on
> > platforms that need them. Otherwise we find ourselves with 3 strings per
> > codepage on all platforms before long in any library.
> >
> > Moreover, many OSes might already provide a way to resolve numbers to
> > names.
> Could be. If we need to maintain them anyway, we might also provide it as
> a platform independent functionality (possibly also as an optional
> additional unit, "just" based on the same include file defining it for
> platforms which need this mapping for runtime anyway due to not having
> numeric values associated with the supported character sets).
I don't see what this would bring.
> >> programs then this needs to be compatible between Delphi and FPC (are
> >> they compatible in other aspects?). As you can see, I'm still not
> >> that clear on the use cases at the moment.
> >
> > It would greatly confuse FPC-Delphi projects for a nearly sterile benefit.
> > The problem is not even the change itself, but actually hunting them down,
> > ifdefing them, getting the changes accepted etc.
> I'm afraid that you haven't helped me too much with my questions regarding
> the use cases. I'm still convinced that we should understand them first
> before deciding on the FPC implementation (e.g. whether we translate some
> Windows/Delphi constants to the platform specific codepage numbers at
> compile time or at runtime).

I discounted compiletime because it is only doable for platforms that have a
conversion based on a 16-bit integer. And I knew iconv doesn't.

Still it could be done to streamline the identifiers (windows and own range)
and make them sequential.

More information about the fpc-devel mailing list