[fpc-devel] Re. z370 Cross Compilation, Pass 2 of ....

Sun Sep 1 16:39:31 CEST 2013

Am 01.09.2013 16:02, schrieb Mark Morgan Lloyd:
> Bernd Oppolzer wrote:
>
> I'm about to head out, so have to be extremely brief.
>
>> Thank you very much for that, that made things much clearer for me.
>>
>> So the compiler relies heavily on the external assembler and the 
>> syntax it supports,
>> as long as you don't want to do changes to step 2 (that is, change 
>> the linear assembler
>> representation, which IMO should not be done in the first step).
>>
>> And: the assembler is not called once, but for every unit.
>>
>> So here, I think, we have some problems or issues, because, as 
>> already pointed out,
>> the z-Arch doesn't have PUSH and POP instructions, and I guess that 
>> the outcome
>> of the linear assembler representation will not be very suitable to 
>> the things that the
>> z-Arch instruction set provides, although in the meantime there are 
>> some 1500 instructions.
>>
>> Understanding that, I would now like to have some description of the 
>> linear assembler
>> representation that FPC generates, that is: it is of course not 
>> target-specific, but it does of
>> course do some assumptions on the type of the underlying hardware. 
>
> Look at the output when using FPC's -a options, for example -aln... 
> that might in practice need the EXTDEBUG setting during compilation 
> but I can't go into more detail now.
>
> Push will typically be used to put parameters onto the stack, 
> otherwise they'll be accessed by indexed operation. The stack frame is 
> discarded by target-specific code.
>

Thank you for that; I will take a look at it, although I have some doubts,
if the output is "target-specific" or "not target-specific" - and if my 
understanding
of the linear assembler representation being "not target-specific" is 
right.

For that question I would like some statement from the core developers:
how would you deal with a machine that has no built in PUSH instruction?
For example if a function call puts five parameters on the stack,
which is

LD A
PUSH
LD B
PUSH
LD C
PUSH
LD D
PUSH
LD E
PUSH
CALL FUNC

given an accumulator which is target of LD and a PUSH instruction which 
PUSHes
the content of the accumulator to the stack.

In my understanding this could be the not-target specific representation of
the calling sequence

The z-Arch could produce something like

L   R5,A
AHI  R1,4
ST R5,0(R1)
L   R5,B
AHI  R1,4
ST R5,0(R1)
L   R5,C
AHI  R1,4
ST R5,0(R1)
L   R5,D
AHI  R1,4
ST R5,0(R1)
L   R5,E
AHI  R1,4
ST R5,0(R1)

here evere PUSH is emulated by the AHI (increment of the "stack pointer" 
R1)
and then the indirect store.

But more efficient would be:

L   R5,A
ST R5,0(R1)
L   R5,B
ST R5,4(R1)
L   R5,C
ST R5,8(R1)
L   R5,D
ST R5,12(R1)
L   R5,E
ST R5,16(R1)
AHI  R1,20

still more efficient, if you use other registers (not only R5);
if so, you can maybe store all the values into the stack using only one
instruction (STM) - if the variables are loaded into consecutive
registers (R5, R6, R7 and so on).

That's what the existing compilers on z-Arch normally do - they don't
compile the PUSH instructions one by one as in the first example, but in 
contrast,
as there are no PUSH/POP instructions provided by the hardware, they do 
some efforts
to do at least only one increment to the stack pointer (like outlined 
above) which
is done in the procedure or function prologue.

Now my question is:

do you think that this is a major problem for a FPC port to z-Arch?

Are my assumptions right so far?

Should we start with an easy solution and check the performance 
implications later?
Maybe there is a clever solution to that ...

Kind regards

Bernd