[fpc-devel] Re: EBCDIC ( was On a port of Free Pascal to the IBM 370)

Mon Jan 30 20:31:05 CET 2012

> Hans-Peter Diettrich wrote on Mon, 30 Jan 2012 17:40:27 +0100
> Existing source code frequently assumes ASCII encoding. The obvious are 
> upper/lowercase conversions, by and/or or add/sub constant values to the 
> characters. It will be hell to find and fix all such code in the 
> compiler and RTL, even if only the constants have to be modified for 
> EBCDIC. Even code with the assumed order of common characters (' ' < '0' 
> < 'A' < 'a') has to be found and fixed manually - how would you even 
> *find* code with such implicit assumptions?

It does indeed.  I am aware of the problems inherent in this.  But the RTL
has to be more or less rewritten anyway to support OS.  OS is a very different
animal to Windows or Linux.

But, you would start with various searches using grep or something
and scan for bits of the code that use constants like '#7' and change them to
fpc_Char_Bell or something similar that would live in an fpcASCII or fpcEBCDIC
unit or something similar.  You would search for all the combinations you could
think of '['a'..'z']', '['A'..'Z']' etc.  Finally, exhausting your ingenuity
you would be left with the old stand-by of testing.

A God-awful task I know.  But what's the alternative?  A note in the documentation
for FreePascal/MVS that whenever you reference any external data it is the user's
responsibility to convert from ASCII to EBCDIC.  Really?  AssignFile(f,'SYS1.PARMLIB'),
sorry doesn't work, you forgot the ASCII conversion;  WRITELN('Hello World') produces
garbage on the user's terminal.  Who will they blame then.  JobSubmit(asciifile) will
disappear from the face of the planet because JES won't have a clue what to do with
an ASCII file.

You can't convert automatically because you don't necessarily know whether the user
is writing ASCII, EBCDIC or binary.  What happens to

  MyRec = record
     Field1 : string;
     Field2 : char;
     Field3 : integer;
     end;

If we are using ASCII should we be using Little-Endian numbers too!

> Next come character ranges, where letters are assumed contiguous in all 
> existing code and examples. Clearly this is true only for ASCII 
> ('a'..'z'), not for national characters like 'Ã¤' or 'Ã©', but the 
> compiler assumes ASCII source encoding all over. Fixing the set 
> constructor to make Set Of Char work with EBCDIC will be a challenge.
> 
> When a user e.g. picks up such example or library code from somewhere, 
> and finds that it doesn't work, he'll blame the compiler for malfunction.
> 
> An EBCDIC based compiler will disallow the use of any foreign libraries, 
> because a simple (syntactic) conversion from ASCII to EBCDIC encoding 
> doesn't cover beforementioned (semantic) issues :-(

A compiler is not just a tool for syntax analysis.  It has semantic routines built
into already.  It's up to us to use enough ingenuity to cater for as many of these
as possible.  Surely it should be possible to pick up stuff like 'a'..'z' at compile-
time

Regards
Steve
> 
> Mark Morgan Lloyd wrote:

> I repeat: IBM is now happily using ASCII on zSeries. That includes the 
> CDSL system made available to developers 
> http://www-03.ibm.com/systems/z/os/linux/support/community.html

Yes. The Community Software Development for Linux on System/Z would use ASCII.  As
we have already ascertained, Linux/390 is an ASCII system;  Using EBCDIC would be
slightly south of stupid.

CDSL doesn't run on OS.  Except possibly under USS.  Does anyone know if USS is
ASCII or EBCDIC?

(USS is Unix System Services, formerly known as Open/MVS.  It's a sort of Unix type
Look-alike ish sort of thing that runs under versions of OS from MVS/ESA SP 4.3 onwards)

> I think the reason for producing an ASCII version first is very simple:

Converting the source from ASCII to EBCDIC isn't a huge problem.  Their are 
many much larger problems ahead :)

> 
> No - sending source code from a PC to a 370 performs an automatic translation to EBCDIC (and vice versa).
> 
It depends on what you use to do the transfer and what options you specify.  These utilities
are normally configurable.  FTP and IND£FILE are.  They're the two I've used in the past.

> IBM 370 doesn't use ASCII, anywhere, but it has a hardware instruction (TRT _ Translate and Test)
> which can convert between character sets in a single instruction using a suitable table. 

Translate and Test wouldn't help.  Despite the name it doesn't actually do any translation as such.
The instruction you meant was TR (Translate).  To quote "Sorry, pedantry strong this one runs" 

--
Regards
Steve