[fpc-devel] x86_64.inc CompareByte

Martok listbox at martoks-place.de
Mon Oct 23 11:01:14 CEST 2017


Using the code given below as "inner", I measure this:

Current Trunk:
O0 compare-byte-1 : 196065.112 +/- 896.754 cycles/inner [0.5 %CV 1.6 %R]
O1 compare-byte-1 : 196510.158 +/- 577.976 cycles/inner [0.3 %CV 1.1 %R]
O3 compare-byte-1 : 187540.922 +/- 706.167 cycles/inner [0.4 %CV 1.5 %R]
Patch from 2017-10-21:
O0 compare-byte-2 : 175831.632 +/- 965.972 cycles/inner [0.5 %CV 2.1 %R]
O1 compare-byte-2 : 176039.560 +/- 527.141 cycles/inner [0.3 %CV 1.0 %R]
O3 compare-byte-2 : 158527.167 +/- 661.690 cycles/inner [0.4 %CV 1.5 %R]
(%CV: coefficient of variance * 100%. %R: span as % of mean)

CPU:
 Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz Family 6 Model 60 Stepping 3 (Haswell)
 true single core clock (measured) 2.83 GHz


So the new version is a bit faster, but not by a large margin (10-15%). It is
statistically significant though.
While I'm at it, i386 could use some love:
O1 compare-byte-1 :  755247.183 +/- 8125.671 cycles/inner [1.1 %CV 4.5 %R]
That's 3.8 times slower than x64 for exactly the same code.

Code:
    len:=random(100);
    for j:=0 to len-1 do
      begin
        buf1[j]:=random(256);
        buf2[j]:=random(256);
      end;

    for j:=0 to random(10) do
      buf2[j]:=buf1[j];

    for j:=1 to 10000 do
      CompareBytePatch(buf1,buf2,len);      // or System.CompareByte


-- 
Regards,
Martok

Ceterum censeo b32079 esse sanandam.




More information about the fpc-devel mailing list