[fpc-devel] Successful implementation of inlinesupportforpureassembler routines on x86
J. Gareth Moreton
gareth at moreton-family.com
Mon Mar 18 10:10:41 CET 2019
I suppose what I'm trying to say is that I don't trust the compiler to
actually use a register in the intermediate stages, especially in larger
procedures. To use AES-NI as an example, will it write back the
partially-encrypted block to the stack after every round, or only after all
the rounds have been completed? It would be perfectly logical to use the
same variable to store the intermediate stage and the final stage if using
a chain of intrinsics. e.g:
ciphertext := _mm_pxor(plaintext, roundkey[0]);
for x := 1 to 9 do
ciphertext := _mm_aesenc(ciphertext, roundkey[x]);
ciphertext := _mm_aesenclast(ciphertext, roundkey[10]);
Of course, it's probably better to manually expand the for-loop here no
matter if intrinsics or inline assembly is used, but I wouldn't want
ciphertext to be written to the stack or some other backing store until at
least the last round. Unless another variable is used, I can see the
compiler writing ciphertext to the stack after every intermedate step.
Granted a lot of it is down to the programmer. e.g:
intermediate := _mm_pxor(plaintext, roundkey[0]);
intermediate := _mm_aesenc(intermediate, roundkey[1]);
intermediate := _mm_aesenc(intermediate, roundkey[2]);
intermediate := _mm_aesenc(intermediate, roundkey[3]);
intermediate := _mm_aesenc(intermediate, roundkey[4]);
intermediate := _mm_aesenc(intermediate, roundkey[5]);
intermediate := _mm_aesenc(intermediate, roundkey[6]);
intermediate := _mm_aesenc(intermediate, roundkey[7]);
intermediate := _mm_aesenc(intermediate, roundkey[8]);
intermediate := _mm_aesenc(intermediate, roundkey[9]);
ciphertext := _mm_aesenclast(intermediate, roundkey[10]);
Or a messier one but which is more likely to guarantee that the
intermediate stages remain in registers:
ciphertext := _mm_aesenclast(
_mm_aesenc(
_mm_aesenc(
_mm_aesenc(
_mm_aesenc(
_mm_aesenc(
_mm_aesenc(
_mm_aesenc(
_mm_aesenc(
_mm_aesenc(
_mm_pxor(plaintext, roundkey[0]),
roundkey[1]),
roundkey[2]),
roundkey[3]),
roundkey[4]),
roundkey[5]),
roundkey[6]),
roundkey[7]),
roundkey[8]),
roundkey[9]),
roundkey[10]);
This is where I feel that inline assembler is a little cleaner and
guarantees a degree of certainty with efficiency.
Gareth aka. Kit
On Mon 18/03/19 07:00 , "Sven Barth" pascaldragon at googlemail.com sent:
J. Gareth Moreton schrieb am So., 17. März 2019, 23:27:
I think one of the main issues with intrinsics is that you don't have
much control over where results are stored. Unless you're chaining a load
of intrinsics together in a mess of function calls in actual parameters,
the result is going to have to be stored in a local variable, which even on
good days will end up being stored on the stack, and problems can occur if
the stack isn't aligned and you're using SSE or AVX instructions. After
that, when you call the next intrinsic that uses the result, it will have
to recall that data from memory.
And I believe that this is the advantage of intrinsics, because here the
compiler *can* decide to use a different register. Especially if the
compiler supports instruction scheduling and such. At work I've worked
with AES-NI and I definitively preferred to work with the intrinsics and
didn't have to care about what registers to use, because the compiler and
optimizer took care of that. That is something that Pascal should stand
for: ease of use. Assembler is not easy to use.
Regards, Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20190318/2edada93/attachment.html>
More information about the fpc-devel
mailing list