[fpc-devel] x86_64.inc CompareByte
Florian Klämpfl
florian at freepascal.org
Sun Oct 29 23:18:28 CET 2017
Am 23.10.2017 um 22:58 schrieb Markus Beth:
> Here are the numbers for on ivy bridge CPU:
> The output for [1] using the current RTL CompareByte is:
> 9.001.275.281 cycles:u ( +- 0,00% )
> 28.000.560.462 instructions:u # 3,11 insn per cycle ( +- 0,00% )
> 2,654735815 seconds time elapsed ( +- 0,00% )
>
> The output for [1] using the x86_64_comparebyte3.patch CompareByte is:
> 9.002.038.628 cycles:u ( +- 0,01% )
> 26.000.559.441 instructions:u # 2,89 insn per cycle ( +- 0,00% )
> 2,655002891 seconds time elapsed ( +- 0,01% )
>
> The output for [2] using the current RTL CompareByte is:
> 227.941.173.371 cycles:u ( +- 0,00% )
> 734.077.388.160 instructions:u # 3,22 insn per cycle ( +- 0,00% )
> 67,215188648 seconds time elapsed ( +- 0,00% )
>
> The output for [2] using the x86_64_comparebyte3.patch CompareByte is:
> 210.694.292.040 cycles:u ( +- 0,00% )
> 524.341.215.569 instructions:u # 2,49 insn per cycle ( +- 0,00% )
> 62,129294243 seconds time elapsed ( +- 0,00% )
>
>
> With Florian's benchmark I also observe that the patched version is
> slightly slower than the original. But I have no idea why this is so.
I have committed your lastest patch with a few changes: the loop entry is aligned now to 16 bytes, I
used movb instead of movbzl and inc instead of add. For me (Haswell CPU) this works better. I think
also these changes are better on average.
More information about the fpc-devel
mailing list