[fpc-devel] i386-linux switched to a 16 byte aligned stack
J. Gareth Moreton
gareth at moreton-family.com
Tue Sep 17 07:24:22 CEST 2019
Ah whoops, misunderstood. Only for i386-linux, not i386-win32 as well.
Would there be benefits to aligning the stack on that platform as well
though?
Gareth aka. Kit
On 16/09/2019 13:32, J. Gareth Moreton wrote:
> It's a useful feature as far as hand-written and generated assembly
> language is concerned. The Intel SIMD instruction sets work far
> better with aligned memory (e.g. you can use MOVAPS instead of MOVUPS,
> the former being faster on older CPUs but triggering a segmentation
> fault if the memory is unaligned). Granted, while vectorcall currently
> only works on x86_64-win64 because I was able to re-use the code for
> the System V ABI, with an aligned stack it might make it potentially
> easier to port it to i386-win32 eventually (under Microsoft Visual
> C++, __vectorcall is supported on 32-bit platforms by only using ECX
> and EDX as the integer registers... the same as __fastcall... speaking
> of 'fastcall' I do wonder if it's worth implementing that calling
> convention in case one wants to communicate with an external library
> that uses the convention).
>
> Gareth aka. Kit
>
> On 15/09/2019 21:07, Florian Klämpfl wrote:
>> Am 15.09.19 um 19:35 schrieb Florian Klämpfl:
>>> In r43005 to 43014 I committed a couple of patches so FPC generates
>>> stack frames aligned to 16 byte boundaries on i386-linux (before a
>>> call instruction, esp is dividable by 16). This is done because it
>>> seems that linux library start to depend on this property gcc
>>> ensures for around 20 years. To ensure this, FPC uses the same
>>> approach as clang (and as FPC for i386-darwin uses): esp has a fixed
>>> value fulfilling the alignment requirements during the whole
>>> procedure. Outgoing parameters are copied by mov instead of push
>>> instructions onto the stack. The consequences of these changes are:
>>> - For pure pascal programs, this does not change anything. The
>>> resulting code might be slightly bigger but in turn floating point
>>> code might be faster as double values can be properly aligned now.
>>> - Most assembler code is not affected by the change. Only code using
>>> constants to access the stack via esp might be affected, such code
>>> is rare.
>>> - Assembler code calling other procedures should be adapted to keep
>>> the stack aligned to 16 byte boundaries as well. Assembler code
>>> working on i386-darwin fulfills this requirement already. The define
>>> FPC_STACKALIGNMENT contains the alignment of the stack (16 in the
>>> case of i386-linux).
>>> - To test if the stack is always properly aligned, compile with -Ct:
>>> the stack checking code for i386-linux checks the stack alignment
>>> now as well.
>>
>> One thing (and actually an important one) I forgot to mention: this
>> means also that the regcall calling conventions we use by default on
>> i386-linux use now a caller-cleared stack. I forgot about because
>> even our regression tests did not find this. OTOH it means, that
>> probably little code out there is affected by this, an exception
>> might be PascalScript.
>>
>> _______________________________________________
>> fpc-devel maillist - fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>
>
> ---
> This email has been checked for viruses by Avast antivirus software.
> https://www.avast.com/antivirus
>
> _______________________________________________
> fpc-devel maillist - fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
More information about the fpc-devel
mailing list