cpstrnew branch (was Re: [fpc-devel] Freepascal 2.4.0rc1 released)

Tomas Hajny XHajT03 at mbox.vol.cz
Thu Nov 12 18:28:09 CET 2009

On Thu, November 12, 2009 14:19, Marco van de Voort wrote:
> In our previous episode, Tomas Hajny said:
>> > Incompatibility how exactly? Two different FPC versions are already
>> not
>> > compatible.
>> If you need to change the used range between e.g. FPC 2.6.x and 2.8.x
>> (due
>> to MS extending their use of the codepage values into the range we
>> decided
>> to use in FPC), this makes 2.6.x and 2.8.x incompatible to each other,
>> right?
> I don't see how. Not if the symbolic constants are used, which is the
> whole
> point. Note that this is already a slim chance. Specially if the ranges
> are
> as big as you say.

OK, I see. That hasn't been my understanding. Still, are we sure that it
only has impacts to usability of existing source files and nothing else?
Is it really sure that the codepage number is never written into a file
when storing the strings? Otherwise compatibility at the level of numeric
values may be necessary.

>> >> is better than incompatibility between FPC and Delphi (caused by
>> tight
>> >> connection between Delphi and one particular platform)...
>> >
>> > That would be source incompatibility, and therefore much worse.
>> First, this may be the case for compatibility between two FPC versions
>> too.
> Not if we specify from the start that the RTL predefined symbolic
> constants
> are the only supported values, and that the windows codepages _might_ work
> depending on platform. (e.g. to avoid having too much overhead on embedded
> targets). IOW numeric values are undefined, but might map to windows
> codepages if the platform supports it.

I'm not sure if we really manage to get this message through. :-( People
who are interested in working at the level of individual codepages would
be exactly those who would probably never take care about translating the
codepage value (as required by Delphi) into some symbolic constant (not
supported by Delphi)...

> We have that luxury. The values in Delphi code with windows codepages are
> already out there, and we have no real power to change that.

We have the luxury, but we can almost equally well skip this definition of
symbolic constants altogether in that case because I suspect that hardly
anyone will use them anyway. Still, I'm more concerned in the
(unnecessary) runtime overhead.

> Sure, you can ultimately convince Delphi open source projects to use FPC
> values and define them for themselves, but is hard, and a problem that you
> face with each new piece of Delphi code. Again and again.

Completely true.

>> Second, the relation between the numeric values appearing in FPC sources
>> and how the compiler translates the sources to the internal
>> representation
>> in memory (which is possibly only valid for the particular platform) is
>> something that may not be the same (depending on the use cases, of
>> course).
> Yes, you could lay a xlat layer within the parser. (source number to
> unicode
> encoding word mapping)
> That will break much less code. (only Delphi assembler code), but pulls
> the
> lot into the compiler.
> Not desirable IMHO, and moreover, I'm not convinced there really is a
> problem that warrants such draconian measures in the first place.

What draconian measure? Per platform mapping? The translation has much
lower impact if performed once at compile time than every time at runtime,

>> > and if foreign OSes support that, they must implement a windows2local
>> > codepage number conversion.
>> As far as I'm concerned, I'm fine with providing a translation table
>> between Windows codepages and individual platforms (e.g. OS/2), but I'm
>> less comfortable with having to use this translation at runtime under
>> all
>> platforms except for Windows and I'm somewhat worried about not having a
>> solution for supporting character set which may be used e.g. for console
>> on non-windows platforms but are not supported by Windows (have a look
>> at
>> the URL sent by Jonas yesterday for Mac OS X; without having performed
>> complete comparison, it seemed to contain some character sets not listed
>> on the MSDN page for Windows).
> The lookup only happens at the iconv moment, which is magnitudes more
> expensive. The example to windows code pages was a bit windows centric,
> but
> was only an example. The word to <whatever the encoding procedure uses>
> transformation is platform dependant.
> In the windows case this means a lookup has to be inserted to handle the
> predefined ones. For other platforms a lookup has to be inserted no matter
> what.

Yes. However, possibly at compile time only (if the source files
compatibility is the only issue we are concerned about - I'm still not
clear about that).

>> >> of certain constants, I can imagine that we should be able to find a
>> >> gap in the windows character set numbering to cover at least all the
>> >> character sets registered by IANA.
>> >
>> > Implementing at all only makes sense if OSes implement them exactly.
>> > Several Windows codepages might map to corresponding IANA sets.
>> Do you have some examples of this case?
> I never brought up IANA :-) The point is while IANA might be a standard,
> the
> APIs probably don't use IANA numbers as

IANA was an example we could use as common number of character sets used
in FPC for all platforms (if this is necessary at all, of course) to avoid
having to "steal" some values in the Windows range for character sets not
supported by MS Windows (since we need to support all the character sets
supported by the individual platforms at the end because of translation
from/to the console character set and since we have no control on MS using
more values any time).

I asked about your examples because I don't think that several MS Windows
codepages might map to individual IANA sets (I believe that each MS
Windows codepage refers to exactly one character set as registered in
IANA; on top of that, there are obviously character sets registered in
IANA but not supported in MS Windows).

>> > I'd wait till this is entirely sure before exposing these names, and
>> only
>> > on
>> > platforms that need them. Otherwise we find ourselves with 3 strings
>> per
>> > codepage on all platforms before long in any library.
>> >
>> > Moreover, many OSes might already provide a way to resolve numbers to
>> > names.
>> Could be. If we need to maintain them anyway, we might also provide it
>> as
>> a platform independent functionality (possibly also as an optional
>> additional unit, "just" based on the same include file defining it for
>> platforms which need this mapping for runtime anyway due to not having
>> numeric values associated with the supported character sets).
> I don't see what this would bring.

OK, I do see some potential benefit there (especially for platforms not
using the character set names as primary identifiers and thus requiring
programmers to perform this translation in a platform or application
specific way), but this was just a thought, nothing really important.

>> >> programs then this needs to be compatible between Delphi and FPC (are
>> >> they compatible in other aspects?). As you can see, I'm still not
>> >> that clear on the use cases at the moment.
>> >
>> > It would greatly confuse FPC-Delphi projects for a nearly sterile
>> benefit.
>> > The problem is not even the change itself, but actually hunting them
>> down,
>> > ifdefing them, getting the changes accepted etc.
>> I'm afraid that you haven't helped me too much with my questions
>> regarding
>> the use cases. I'm still convinced that we should understand them first
>> before deciding on the FPC implementation (e.g. whether we translate
>> some
>> Windows/Delphi constants to the platform specific codepage numbers at
>> compile time or at runtime).
> I discounted compiletime because it is only doable for platforms that have
> a
> conversion based on a 16-bit integer. And I knew iconv doesn't.

At least Mac OS X and OS/2 do and such a translation is ways more
efficient than working at the level of string constants (if that is
supported at that platform as an alternative option like with iconv on Mac
OS X).

> Still it could be done to streamline the identifiers (windows and own
> range)
> and make them sequential.

Yes, this would also have certain benefits. My only aim was to clarify how
that would work and point out some open areas resulting from the fact that
Delphi didn't have to solve multi-platform support.


More information about the fpc-devel mailing list