[fpc-devel] On a port of Free Pascal to the IBM 370

Wed Jan 18 12:28:37 CET 2012

On Wed, January 18, 2012 11:57, michael.vancanneyt at wisa.be wrote:
> On Wed, 18 Jan 2012, Tomas Hajny wrote:
>> On Wed, January 18, 2012 11:23, michael.vancanneyt at wisa.be wrote:
>>> On Wed, 18 Jan 2012, Tomas Hajny wrote:
>>>> On Wed, January 18, 2012 10:15, michael.vancanneyt at wisa.be wrote:
>>>>> On Wed, 18 Jan 2012, Michael Schnell wrote:
>>>>>
>>>>>> AFAI learned:
>>>>>> I suppose the code generator should be doable, regarding that there
>>>>>> already
>>>>>> are several supported CPUs. At least a working compiler might come
>>>>>> into
>>>>>> existence in a decent amount of time, adding optimizations is
>>>>>> another
>>>>>> project.
>>>>>>
>>>>>> OTOH I suppose that a porting the RTL to a mainframe OS will not be
>>>>>> easy
>>>>>> and
>>>>>> without this the compiler is quite useless.
>>>>>
>>>>> I do not think it is more difficult than any other OS.
>>>>
>>>> ...except for the EBCDIC stuff, because the common parts of our RTL
>>>> assume
>>>> ASCII in many places (most of them probably not that difficult to fix
>>>> by
>>>> adding some platform specific constants changing the behaviour from
>>>> ASCII
>>>> only to consider EBCDIC too, but scattered around many places and thus
>>>> difficult to find). That doesn't mean it shouldn't be doable, of
>>>> course,
>>>> it will just require debugging even parts which didn't have to be
>>>> touched
>>>> during ports to other operating systems.
>>>
>>> ? It just means you must convert ascii to ebcdic in OS calls that
>>> require
>>> strings. All these calls must be re-implemented anyway, so a generic
>>> routine
>>> to do this conversion seems like the obvious path. I doubt this will be
>>> the
>>> real bottleneck :-)
>>
>> It should not be a bottleneck, but I'm afraid that you underestimate it
>> a
>> bit. As an example, searching for #10 and #9 across files (just) in
>> rtl/inc (there's much more in rtl/objpas) shows quite a few places which
>> would need to be changed for EBCDIC support and which are not touched
>> otherwise during a RTL port (control characters have completely
>> different
>> layout in EBCDIC compared to ASCII). Also case insensitive search for
>> "'a'" (just as an example - there are more ways how this can appear in
>> the
>> code) finds several places containing code assuming either certain
>> position of the standard alphabet ('a'..'z') in the character set - both
>> assumptions regarding the absolute value of 'a' (or 'A') used e.g. for
>> translation of hexadecimal numbers, or assumptions about the whole
>> alphabet being in one consecutive range (which is not the case for
>> EBCDIC).
>>
>
> But then you are assuming the RTL should be using EBCDIC internally as
> well ?
> Obviously, that will be a lot more work.
>
> But I don't think this should be so.

I may be overlooking something, of course. However: Our RTL is based on
common (target specific) routines for reading (and writing) text and
binary files (do_read & do_write). You cannot translate between ASCII and
EBCDID in these target specific routines because you don't know how the
input would be used at that point (not even mentioning the fact that there
is nothing like translation between ASCII and EBCDIC because there are
multiple different character sets for both and real conversion isn't
possible without taking this into account and knowing the real character
sets which again depends on the context which is again not known at this
low level). Unless I'm mistaken, this implies that you indeed need to
consider the (basic) EBCDIC layout as an alternative to the (basic) ASCII
layout directly within the RTL. This should be obviously done in a way not
impacting the RTL behaviour for other targets. However, that is certainly
possible using either constants differentiating between ASCII and EBCDIC
(thanks to elimination of the unused code identified at compile time by
the compiler), or using IFDEFs, or using special versions for certain
routines (located in different include files, etc.) - although I
personally believe that the last one is the least viable one in this case
due to likely duplication of majority of the code within such routines in
the two implementations.

Not even mentioning the additional "minor" issue with certain characters
(critical for Pascal source codes) not necessarily directly available in
_some_ (!) EBCDIC character sets as pointed out by Mark - again something
which cannot be handled in the general I/O routines because it only
becomes important when interpreting a general text as Pascal source code
(in this case, special support on the compiler side will be probably
necessary, i.e. this should have no impact to RTL, but it will again have
impact to the common parts of the compiler, namely scanner, not to target
specific units).

Tomas