[fpc-devel] x86_64.inc CompareByte
Martok
listbox at martoks-place.de
Mon Oct 23 11:01:14 CEST 2017
Using the code given below as "inner", I measure this:
Current Trunk:
O0 compare-byte-1 : 196065.112 +/- 896.754 cycles/inner [0.5 %CV 1.6 %R]
O1 compare-byte-1 : 196510.158 +/- 577.976 cycles/inner [0.3 %CV 1.1 %R]
O3 compare-byte-1 : 187540.922 +/- 706.167 cycles/inner [0.4 %CV 1.5 %R]
Patch from 2017-10-21:
O0 compare-byte-2 : 175831.632 +/- 965.972 cycles/inner [0.5 %CV 2.1 %R]
O1 compare-byte-2 : 176039.560 +/- 527.141 cycles/inner [0.3 %CV 1.0 %R]
O3 compare-byte-2 : 158527.167 +/- 661.690 cycles/inner [0.4 %CV 1.5 %R]
(%CV: coefficient of variance * 100%. %R: span as % of mean)
CPU:
Intel(R) Core(TM) i5-4200M CPU @ 2.50GHz Family 6 Model 60 Stepping 3 (Haswell)
true single core clock (measured) 2.83 GHz
So the new version is a bit faster, but not by a large margin (10-15%). It is
statistically significant though.
While I'm at it, i386 could use some love:
O1 compare-byte-1 : 755247.183 +/- 8125.671 cycles/inner [1.1 %CV 4.5 %R]
That's 3.8 times slower than x64 for exactly the same code.
Code:
len:=random(100);
for j:=0 to len-1 do
begin
buf1[j]:=random(256);
buf2[j]:=random(256);
end;
for j:=0 to random(10) do
buf2[j]:=buf1[j];
for j:=1 to 10000 do
CompareBytePatch(buf1,buf2,len); // or System.CompareByte
--
Regards,
Martok
Ceterum censeo b32079 esse sanandam.
More information about the fpc-devel
mailing list