[fpc-devel] i386-linux switched to a 16 byte aligned stack

Tue Sep 17 07:24:22 CEST 2019

Ah whoops, misunderstood.  Only for i386-linux, not i386-win32 as well.  
Would there be benefits to aligning the stack on that platform as well 
though?

Gareth aka. Kit

On 16/09/2019 13:32, J. Gareth Moreton wrote:
> It's a useful feature as far as hand-written and generated assembly 
> language is concerned.  The Intel SIMD instruction sets work far 
> better with aligned memory (e.g. you can use MOVAPS instead of MOVUPS, 
> the former being faster on older CPUs but triggering a segmentation 
> fault if the memory is unaligned). Granted, while vectorcall currently 
> only works on x86_64-win64 because I was able to re-use the code for 
> the System V ABI, with an aligned stack it might make it potentially 
> easier to port it to i386-win32 eventually (under Microsoft Visual 
> C++, __vectorcall is supported on 32-bit platforms by only using ECX 
> and EDX as the integer registers... the same as __fastcall... speaking 
> of 'fastcall' I do wonder if it's worth implementing that calling 
> convention in case one wants to communicate with an external library 
> that uses the convention).
>
> Gareth aka. Kit
>
> On 15/09/2019 21:07, Florian Klämpfl wrote:
>> Am 15.09.19 um 19:35 schrieb Florian Klämpfl:
>>> In r43005 to 43014 I committed a couple of patches so FPC generates 
>>> stack frames aligned to 16 byte boundaries on i386-linux (before a 
>>> call instruction, esp is dividable by 16). This is done because it 
>>> seems that linux library start to depend on this property gcc 
>>> ensures for around 20 years. To ensure this, FPC uses the same 
>>> approach as clang (and as FPC for i386-darwin uses): esp has a fixed 
>>> value fulfilling the alignment requirements during the whole 
>>> procedure. Outgoing parameters are copied by mov instead of push 
>>> instructions onto the stack. The consequences of these changes are:
>>> - For pure pascal programs, this does not change anything. The 
>>> resulting code might be slightly bigger but in turn floating point 
>>> code might be faster as double values can be properly aligned now.
>>> - Most assembler code is not affected by the change. Only code using 
>>> constants to access the stack via esp might be affected, such code 
>>> is rare.
>>> - Assembler code calling other procedures should be adapted to keep 
>>> the stack aligned to 16 byte boundaries as well. Assembler code 
>>> working on i386-darwin fulfills this requirement already. The define 
>>> FPC_STACKALIGNMENT contains the alignment of the stack (16 in the 
>>> case of i386-linux).
>>> - To test if the stack is always properly aligned, compile with -Ct: 
>>> the stack checking code for i386-linux checks the stack alignment 
>>> now as well.
>>
>> One thing (and actually an important one) I forgot to mention: this 
>> means also that the regcall calling conventions we use by default on 
>> i386-linux use now a caller-cleared stack. I forgot about because 
>> even our regression tests did not find this. OTOH it means, that 
>> probably little code out there is affected by this, an exception 
>> might be PascalScript.
>>
>> _______________________________________________
>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>