[fpc-devel] x86_64 SHA1 implementation

J. Gareth Moreton gareth at moreton-family.com
Sat Sep 16 15:13:55 CEST 2023


Hi everyone,

So this past week I've been building on Rika's work by adding an 
assembly version of SHA-1 for x86_64 to complement Rika's i386 version.  
So far I've successfully made a version that runs twice as fast as the 
Pascal code.  I hoped to go even faster by making use of the SSE2 
instruction set, but currently the end result is slower even though 
computing the common parts of 4 rounds simultaneously should be much 
faster.  This occurs even when I forgo writing to the stack and keep 
pretty much all of the state within registers.  Preliminary 
investigation suggests that the slowdown comes from using MOVD/Q to 
transfer data between the XMM registers and general-purpose registers, 
since they are different parts of the CPU.  I'm still amazed it causes 
this much latency though.

I'll keep investigating and seeing if I can squeeze out more 
performance, but otherwise I may just have to fall back on a 
non-SIMD-optimised implementation.

Kit



More information about the fpc-devel mailing list