[fpc-devel] vmul commutative optimization?
Marco van de Voort
core at pascalprogramming.org
Tue Nov 12 17:05:56 CET 2019
Op 12/11/2019 om 16:08 schreef J. Gareth Moreton:
> It's true. With VMULSS, only the first parameter (third parameter
> under Intel notation) can be an address (source: Intel(R) 64 and IA-32
> Architectures Software Development Manual, Volume 2B, Page 4-154).
> I'll see if I can work in that optimisation for the commutative
> operations (+ and *) at some point from the node side.
Another tidbit I noticed while playing with (elements of) the complex
patch is that if I set the elementsize to double (re:double;im:double)
that with vectorcall loads all data into registers.
However if I make it single, (iow the tcomplex is 8-byte), the records
are loaded into integer registers, and the compiler stores them to the
stack and then reloads them.
This matters less for me since it won't vectorize anyway (see inline and
philosophy thread) I'll change this routine to assembler I think,
accepting a pointer and load and store from that thread.
More information about the fpc-devel