[fpc-devel] FillWord, FillDWord and FillQWord are very poorly optimised on Win64 (not sure about x86-64 on Linux)
sergei_gorelkin at mail.ru
Wed Nov 1 12:03:36 CET 2017
01.11.2017 10:46, Sven Barth via fpc-devel wrote:
> Am 01.11.2017 05:58 schrieb "J. Gareth Moreton" <gareth at moreton-family.com
> <mailto:gareth at moreton-family.com>>:
> Would it be worth opening up a bug report for this, with the attached assembler routines as
> suggestions? I
> haven't worked out how to implement internal functions into the compiler yet, and I rather clear
> it with you
> guys first before I make such an addition. I had a thought that the simple routines above could
> be used for
> when compiling for small code size, while larger, more advanced ones are used for when compiling
> for speed.
> Improvements like these are always welcome. Two points however:
> The Fill* routines are not part of the compiler, but of the RTL (the Pascal routines are in
> rtl/inc/generic.inc, the assembly ones reside in rtl/CPU/CPU.inc) and they aren't handled
> differently depending on the current optimization flags, so a one-size-fits-all is needed (look at
> e.g. the i386 ones).
> I also think that you might need to handle memory that isn't correctly aligned for the assembler
> instructions (I didn't look at your routines in detail so I don't know whether they'd need to be
> adjusted for that). A check of the i386 routines will probably help here as well.
Another important thing to note is that all modifications to stack pointer and nonvolatile registers
on x86_64 need SEH annotations in win64 and CFI annotations on linux/bsd. The former is available
only in AT&T syntax, the latter is not supported.
This requierment, together with different parameter locations, makes writing assembler routines for
x86_64 much more complicated than for i386. For this reason, existing assembler routines in RTL
avoid using nonvolatile registers as much as possible.
More information about the fpc-devel