[fpc-devel] x86_64 SHA1 implementation
florian at freepascal.org
Fri Sep 15 23:48:19 CEST 2023
Am 16.09.23 um 15:13 schrieb J. Gareth Moreton via fpc-devel:
> Hi everyone,
> So this past week I've been building on Rika's work by adding an
> assembly version of SHA-1 for x86_64 to complement Rika's i386 version.
> So far I've successfully made a version that runs twice as fast as the
> Pascal code. I hoped to go even faster by making use of the SSE2
> instruction set, but currently the end result is slower even though
> computing the common parts of 4 rounds simultaneously should be much
> faster. This occurs even when I forgo writing to the stack and keep
> pretty much all of the state within registers. Preliminary
> investigation suggests that the slowdown comes from using MOVD/Q to
> transfer data between the XMM registers and general-purpose registers,
> since they are different parts of the CPU. I'm still amazed it causes
> this much latency though.
> I'll keep investigating and seeing if I can squeeze out more
> performance, but otherwise I may just have to fall back on a
> non-SIMD-optimised implementation.
As SHA-1 is basically deprecated and not recommended to be used anymore,
I wouldn't spend too much into this. Besides this, for SHA-1 and
SHA-256, it might be even more useful to use the SHA CPU extensions if
available. While they are only introduced in Ice Lake and Zen, they will
get more and more available in the future.
More information about the fpc-devel