[fpc-devel] FillWord, FillDWord and FillQWord are very poorly optimised on Win64 (not sure about x86-64 on Linux)

Sergei Gorelkin sergei_gorelkin at mail.ru
Wed Nov 1 12:03:36 CET 2017

01.11.2017 10:46, Sven Barth via fpc-devel wrote:
> Am 01.11.2017 05:58 schrieb "J. Gareth Moreton" <gareth at moreton-family.com 
> <mailto:gareth at moreton-family.com>>:
>     Would it be worth opening up a bug report for this, with the attached assembler routines as
>     suggestions? I
>     haven't worked out how to implement internal functions into the compiler yet, and I rather clear
>     it with you
>     guys first before I make such an addition.  I had a thought that the simple routines above could
>     be used for
>     when compiling for small code size, while larger, more advanced ones are used for when compiling
>     for speed.
> Improvements like these are always welcome. Two points however:
> The Fill* routines are not part of the compiler, but of the RTL (the Pascal routines are in 
> rtl/inc/generic.inc, the assembly ones reside in rtl/CPU/CPU.inc) and they aren't handled 
> differently depending on the current optimization flags, so a one-size-fits-all is needed (look at 
> e.g. the i386 ones).
> I also think that you might need to handle memory that isn't correctly aligned for the assembler 
> instructions (I didn't look at your routines in detail so I don't know whether they'd need to be 
> adjusted for that). A check of the i386 routines will probably help here as well.

Another important thing to note is that all modifications to stack pointer and nonvolatile registers 
on x86_64 need SEH annotations in win64 and CFI annotations on linux/bsd. The former is available 
only in AT&T syntax, the latter is not supported.
This requierment, together with different parameter locations, makes writing assembler routines for 
x86_64 much more complicated than for i386. For this reason, existing assembler routines in RTL 
avoid using nonvolatile registers as much as possible.

More information about the fpc-devel mailing list