[fpc-devel] x86_64 SHA1 implementation

Wayne Sherman wsherman at gmail.com
Sat Sep 16 17:18:53 CEST 2023


J. Gareth Moreton via fpc-devel <fpc-devel at lists.freepascal.org> wrote:
> So this past week I've been building on Rika's work by adding an
> assembly version of SHA-1 for x86_64 to complement Rika's i386 version.
> So far I've successfully made a version that runs twice as fast as the
> Pascal code.  I hoped to go even faster by making use of the SSE2
> instruction set...

In 2010 Intel published SSE3 code to improve SHA1 performance.  Later
that year it was incorporated into OpenSSL ASM code.  The OpenSSL code
also includes AVX and SHA acceleration extensions.

Intel Article:
https://www.intel.com/content/www/us/en/developer/articles/technical/improving-the-performance-of-the-secure-hash-algorithm-1.html

Brief on Intel SHA extensions (also works for AMD Zen and later CPUs)
https://en.wikipedia.org/wiki/Intel_SHA_extensions

OpenSSL x86 64-bit assembly code and performance chart
https://github.com/openssl/openssl/blob/master/crypto/sha/asm/sha1-x86_64.pl

######################################################################
# Current performance is summarized in following table. Numbers are
# CPU clock cycles spent to process single byte (less is better).
#
#               x86_64         SSSE3            AVX[2]
# P4            9.05           -
# Opteron       6.26           -
# Core2         6.55           6.05/+8%         -
# Westmere      6.73           5.30/+27%        -
# Sandy Bridge  7.70           6.10/+26%        4.99/+54%
# Ivy Bridge    6.06           4.67/+30%        4.60/+32%
# Haswell       5.45           4.15/+31%        3.57/+53%
# Skylake       5.18           4.06/+28%        3.54/+46%
# Bulldozer     9.11           5.95/+53%
# Ryzen         4.75           3.80/+24%        1.93/+150%(**)
# VIA Nano      9.32           7.15/+30%
# Atom          10.3           9.17/+12%
# Silvermont    13.1(*)        9.37/+40%
# Knights L     13.2(*)        9.68/+36%        8.30/+59%
# Goldmont      8.13           6.42/+27%        1.70/+380%(**)
#
# (*) obviously suboptimal result, nothing was done about it,
# because SSSE3 code is compiled unconditionally;
# (**) SHAEXT result


More information about the fpc-devel mailing list