[fpc-devel] Re: EBCDIC ( was On a port of Free Pascal to the IBM 370)

Mark Morgan Lloyd markMLl.fpc-devel at telemetry.co.uk
Tue Jan 31 09:28:49 CET 2012


Hans-Peter Diettrich wrote:
> Sven Barth schrieb:
>> On 30.01.2012 20:31, steve smithers wrote:
>>>> Hans-Peter Diettrich wrote on Mon, 30 Jan 2012 17:40:27 +0100
>>>> Existing source code frequently assumes ASCII encoding. The obvious are
>>>> upper/lowercase conversions, by and/or or add/sub constant values to 
>>>> the
>>>> characters. It will be hell to find and fix all such code in the
>>>> compiler and RTL, even if only the constants have to be modified for
>>>> EBCDIC. Even code with the assumed order of common characters (' '<  
>>>> '0'
>>>> <  'A'<  'a') has to be found and fixed manually - how would you even
>>>> *find* code with such implicit assumptions?
>>>
>>> It does indeed.  I am aware of the problems inherent in this.  But 
>>> the RTL
>>> has to be more or less rewritten anyway to support OS.  OS is a very 
>>> different
>>> animal to Windows or Linux.
>>
>> The RTL consists of two parts (though the border is not easily 
>> visible): a platform independant one and a platform dependant one. A 
>> port to a different target normally only includes touching the 
>> platform dependant one, but a port to 370 also requires touching the 
>> platform independant one. This is what DoDi talks about.
> 
> It's not anything the compiler could solve. Find out what will happen on 
> e.g.
>   for c := 'A' to 'Z' do ...
>   for c := '0' to 'Z' do ...
> (where the literals 'A' etc. could be named constants, or computed values)
> 
> With EBCDIC encoding the second loop will never be entered!
> 
>> @other devs: Could the code page aware AnsiString type be of any help 
>> here?
> 
> Only at the I/O side, when files are read/written, or when strings 
> (filenames!) are sent or received via the OS API. The latter reminds me 
> of the Windows OEM charset, used in console I/O, which could be 
> exchanged to mean EBCDIC in IBM consoles.
> 
> Unfortunately the Encoding is available only with *strings*, not with 
> single characters. New types like EBCDICchar could be introduced, 
> different from AnsiChar, and a directive telling the compiler "literals 
> are EBCDIC" or "Char is EBCDICchar".

I'd suggest that the thing to do is to first target the compiler at 
Linux, i.e. ASCII, hosted on a PC. Once that is adequately working 
branch the RTL for EBCDIC, with the intention that this is basically a 
set of conversion patches and that the master remains ASCII.

Or of this isn't acceptable because the IBM developers feel we're trying 
to force them into our image, let's meet half way and use Solaris which 
nobody really enjoys.

-- 
Mark Morgan Lloyd
markMLl .AT. telemetry.co .DOT. uk

[Opinions above are the author's, not those of his employers or colleagues]



More information about the fpc-devel mailing list