[fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt
Martin Frb
lazarus at mfriebe.de
Tue Jan 4 17:03:24 CET 2022
@Marco: havent played with popcnt => it could benefit from the "const to
var" too.
So I played around a bit...
Of course, all this is intel only....
1)
var
Mask8, Mask1: qword;
....
Mask8 := EIGHTYMASK;
Mask1 := ONEMASK;
And the constant no longer is assigned inside the loop.
Also makes the loop shorter.
=> improves speed
2)
//for i := 1 to (ByteCount-cnt) div sizeof(PtrInt) do
//for i := (ByteCount-cnt) div sizeof(PtrInt) - 1 downto 0 do
i := (ByteCount-cnt) div sizeof(PtrInt) ;
repeat
....
dec(i);
until i = 0;
The orig:
for i := 1 to (ByteCount-cnt) div sizeof(PtrInt) do
// r9 is reserved to hold the upper bound
.Lj26:
addq $1,%r10
...
cmpq %r10,%r9
jnle .Lj26
Since the counter var "i" is not accessed in the loop, its value does
not matter.
So
for i := (ByteCount-cnt) div sizeof(PtrInt) - 1 downto 0 do
// no longer needs to store an upper bound, but still has an extra
"test" since the "subq" is at the start of the loop
.Lj26:
subq $1,%r10
...
testq %r10,%r10
jnle .Lj26
// "repeat " , and there no longer is a "test"
And that again reduced the loop size.
And apparently just below a critical point, as time get a little better
again
-------------
WITH the constants moved to var:
orig for : 547
downto:547
repeat: 516
More information about the fpc-devel
mailing list