[fpc-devel] Question on updating FPC packages
J. Gareth Moreton
gareth at moreton-family.com
Tue Oct 29 12:23:15 CET 2019
When it comes to testing vectorcall, uComplex isn't the best example
actually because most of the operators are inlined. There are a number
of tests under "tests/test/cg" that test vectorcall and the System V ABI
using a Pascal implementation of the opaque __m128 type (the two ABIs
should behave exactly the same when dealing with simple vectors).
If anything though, the example function you gave (I'll need to
double-check what ComplexScl does though, if it isn't a simple
multiplication) would be a pretty solid and heavy-duty test of the
compiler attempting to vectorise the code - in an ideal world,
individual calls to ComplexAdd and ComplexSub (which are simple + and -
operations in uComplex) will compile into a single line of assembly
language (ADDPD and SUBPD respectively). Nevertheless, one could
disable the inlining to see how well the compiler handles the function
chaining, since with aligned data, the result from XMM0 should be easily
transposed in one go to another XMM register if not just left alone as
parameter data for the next function.
Gareth aka. Kit
On 29/10/2019 11:06, Marco van de Voort wrote:
> Op 2019-10-27 om 09:02 schreef Florian Klämpfl:
>> I guess you're right. It just seems weird because the System V ABI
>> was designed from the start to use the MM registers fully, so long as
>> the data is aligned. In effect, it had vectorcall wrapped into its
>> design from the start. Granted, vectorcall has some advantages and
>> can deal with relatively complex aggregates that the System V ABI
>> cannot handle (for example, a record type that contains a normal
>> vector and information relating to bump mapping).
>>> I just hoped that making updates to uComplex, while ensuring
>>> existing Pascal code still compiles, would help take advantage of
>>> modern ABI designs.
>> Is there currently any example which shows that vectorcall has any
>> advantage with FPC? Else I would propose first to make FPC able to
>> take advantage of it and then talk about if we really add vectorcall.
>> Currently I fear, FPC gets only into trouble when using vectorcall as
>> it tries first to push everything into one xmm register and then
>> splits this again in the callee.
> Nils Haeck's FFT unit might be interesting. (same guy as nativejpg
> unit iirc, http://www.simdesign.nl)
> It is a D7 language level unit that uses a complex record and simple
> procedures as options. It should be easy to transpose to ucomplex. It
> is quite hll and switchable between single and double. (I use it in
> single mode, but to test vectorcall, obviously double mode would be
> And it has routines that do a variety of complex operations.
> procedure FFT_5(var Z: array of TComplex); // usage of open array is
> to make things generic. Could be solved differently.
> T1, T2, T3, T4, T5: TComplex;
> M1, M2, M3, M4, M5: TComplex;
> S1, S2, S3, S4, S5: TComplex;
> T1 := ComplexAdd(Z, Z);
> T2 := ComplexAdd(Z, Z);
> T3 := ComplexSub(Z, Z);
> T4 := ComplexSub(Z, Z);
> T5 := ComplexAdd(T1, T2);
> Z := ComplexAdd(Z, T5);
> M1 := ComplexScl(c51, T5);
> M2 := ComplexScl(c52, ComplexSub(T1, T2));
> M3.Re := -c53 * (T3.Im + T4.Im); // replace by
> i*add(t3,t4).scale(c53-i*c53) ?
> M3.Im := c53 * (T3.Re + T4.Re);
> M4.Re := -c54 * T4.Im;
> M4.Im := c54 * T4.Re;
> M5.Re := -c55 * T3.Im;
> M5.Im := c55 * T3.Re;
> S3 := ComplexSub(M3, M4);
> S5 := ComplexAdd(M3, M5);;
> S1 := ComplexAdd(Z, M1);
> S2 := ComplexAdd(S1, M2);
> S4 := ComplexSub(S1, M2);
> Z := ComplexAdd(S2, S3);
> Z := ComplexAdd(S4, S5);
> Z := ComplexSub(S4, S5);
> Z := ComplexSub(S2, S3);
> fpc-devel maillist - fpc-devel at lists.freepascal.org
This email has been checked for viruses by Avast antivirus software.
More information about the fpc-devel