[fpc-devel] x86_64 SHA1 implementation
J. Gareth Moreton
gareth at moreton-family.com
Sat Sep 16 15:13:55 CEST 2023
Hi everyone,
So this past week I've been building on Rika's work by adding an
assembly version of SHA-1 for x86_64 to complement Rika's i386 version.
So far I've successfully made a version that runs twice as fast as the
Pascal code. I hoped to go even faster by making use of the SSE2
instruction set, but currently the end result is slower even though
computing the common parts of 4 rounds simultaneously should be much
faster. This occurs even when I forgo writing to the stack and keep
pretty much all of the state within registers. Preliminary
investigation suggests that the slowdown comes from using MOVD/Q to
transfer data between the XMM registers and general-purpose registers,
since they are different parts of the CPU. I'm still amazed it causes
this much latency though.
I'll keep investigating and seeing if I can squeeze out more
performance, but otherwise I may just have to fall back on a
non-SIMD-optimised implementation.
Kit
More information about the fpc-devel
mailing list