[fpc-devel] x86_64 SHA1 implementation
J. Gareth Moreton
gareth at moreton-family.com
Sat Sep 16 17:29:51 CEST 2023
Thanks for the resources - these will prove very useful! Intel and AMD
processors also have specialised SHA instructions later on. I know the
AMD Zen supports them - not sure the earliest Intel models though.
Currently I'm sticking with pure SSE2 since this is the latest
instruction set that is guaranteed to be available on all x86_64
processors. I can write versions for SSSE3 and AVX later, but currently
I'm trying to identify the mysterious performance drops.
Kit
On 16/09/2023 16:18, Wayne Sherman wrote:
> J. Gareth Moreton via fpc-devel <fpc-devel at lists.freepascal.org> wrote:
>> So this past week I've been building on Rika's work by adding an
>> assembly version of SHA-1 for x86_64 to complement Rika's i386 version.
>> So far I've successfully made a version that runs twice as fast as the
>> Pascal code. I hoped to go even faster by making use of the SSE2
>> instruction set...
> In 2010 Intel published SSE3 code to improve SHA1 performance. Later
> that year it was incorporated into OpenSSL ASM code. The OpenSSL code
> also includes AVX and SHA acceleration extensions.
>
> Intel Article:
> https://www.intel.com/content/www/us/en/developer/articles/technical/improving-the-performance-of-the-secure-hash-algorithm-1.html
>
> Brief on Intel SHA extensions (also works for AMD Zen and later CPUs)
> https://en.wikipedia.org/wiki/Intel_SHA_extensions
>
> OpenSSL x86 64-bit assembly code and performance chart
> https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha1-x86_64.pl
>
> ######################################################################
> # Current performance is summarized in following table. Numbers are
> # CPU clock cycles spent to process single byte (less is better).
> #
> # x86_64 SSSE3 AVX[2]
> # P4 9.05 -
> # Opteron 6.26 -
> # Core2 6.55 6.05/+8% -
> # Westmere 6.73 5.30/+27% -
> # Sandy Bridge 7.70 6.10/+26% 4.99/+54%
> # Ivy Bridge 6.06 4.67/+30% 4.60/+32%
> # Haswell 5.45 4.15/+31% 3.57/+53%
> # Skylake 5.18 4.06/+28% 3.54/+46%
> # Bulldozer 9.11 5.95/+53%
> # Ryzen 4.75 3.80/+24% 1.93/+150%(**)
> # VIA Nano 9.32 7.15/+30%
> # Atom 10.3 9.17/+12%
> # Silvermont 13.1(*) 9.37/+40%
> # Knights L 13.2(*) 9.68/+36% 8.30/+59%
> # Goldmont 8.13 6.42/+27% 1.70/+380%(**)
> #
> # (*) obviously suboptimal result, nothing was done about it,
> # because SSSE3 code is compiled unconditionally;
> # (**) SHAEXT result
>
More information about the fpc-devel
mailing list