[fpc-devel] On a port of Free Pascal to the IBM 370

Wed Jan 18 18:37:34 CET 2012

On Wed, January 18, 2012 17:32, Felipe Monteiro de Carvalho wrote:
> 2012/1/18 Tomas Hajny <XHajT03 at hajny.biz>:
>> As pointed out in my other e-mail, "everywhere necessary" implies either
>> "dear user, convert all your files from the original encoding before you
>> want programs created in FPC to touch them"
>
> Yes, no problem here. I assume there must be some program in this
> platform which can edit ASCII text and another one (or the same) to
> convert text files between encodings. If not, just use the new port to
> cross-compile such a program =)

I don't know the typical situation on S/370 machines nowadays, so the
following comment may not be valid. However: how likely would you be to
use some new program if any interchange of files between this program and
whatever other software used on your machine would require you to convert
these files back and forth manually all the time (considering that the
existing software may be the main reason for still using this kind of
platform at all)?

>> or "dear programmer targetting
>> OS/370, make sure that your programs are limited in what RTL functions
>> you
>> use, or convert all locally stored files to ASCII and only use the RTL
>> functions for text processing on the converted copies". Otherwise even
>> stuff like line by line reading or field by field reading of the input
>> text file using standard RTL routines may not work as expected with the
>> current RTL.
>
> I don't see why. A text encoding is just a text encoding, of the
> hundreds of obsolete ones in existence, and the only sane way of
> handling text in cross-platform applications is Unicode.

My point is not about cross-platform applications here. My point is about
applications running on zOS / OS/370 / ... natively (whether these
applications should be cross-platform or not is the second step in my
opinion).

> The RTL could ship with UTF-8 <-> EBCDIC convertor and define UTF-8 as
> the platform encoding. Detect which exact format the platform is using
> at runtime if necessary and convert everywhere necessary. This should
> cover all characters possibly imaginable and all control characters
> too.
>
> What could go wrong here? This is what Java does in all its platforms.

I don't know how it works for Java, but I know that it cannot work
transparently in current FPC RTL without making at least some changes in
the common parts (platform independent so-far). Most likely something
similar (i.e. changes to otherwise platform independent RTL parts)
happened to Java too when ported to S/370, or the run-time part design
included such kind of considerations from the beginning (which I
personally doubt ;-) ).

> As for WriteLn / ReadLn if one really wants to allow inputing directly
> control codes, one could either make them use UTF-8 and offer an
> alternative RawWriteLn / RawReadLn for raw input of control codes or
> leave them sending raw text and expose the routines to convert UTF-8
> <-> EBCDIC

The point is not about the programmer interested in inputting the control
codes directly (although that may be a valid scenario too if the
programmer wants to work the same way he is used to on the other
platforms), the point is about common parts of FPC RTL having e.g.
hard-coded #9 as the tab character (which in turn controls how fields in
text files may be separated from each other) or #10 as the line feed
character and that this stuff happens at some point during transition from
the generic "binary file I/O" to "text file processing" in common parts of
the standard FPC RTL (as it stands now).

BTW, even if one would consider e.g. translation from EBCDIC to UTF-8 in
FileReadFunc (modified from its current standard platform independent
implementation) because that is the place where generic files become to be
interpreted as textual content in our implementation of standard Pascal
RTL functions, it would fail in other routines in rtl/inc/text.inc due to
difference between length read from the original file stored in EBCDIC and
the size necessary in the text file buffer after the translation to
UTF-8).

Again - I'm sure multiple solutions exist, but I cannot imagine how it
could work reasonably well without touching the common parts of the FPC
RTL at least a bit (but unfortunately at multiple places which may be hard
to find) except for a limited 'proof of concept' solution not meant to be
used in standard ways (obviously, limiting all text file I/O to files
encoded in ASCII or Unicode and not doing any console I/O if it cannot
support ASCII or Unicode directly may be perfectly acceptable in such
'proof of concept' mode).

Tomas