[fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

Martin Frb lazarus at mfriebe.de
Tue Jan 4 17:03:24 CET 2022


@Marco: havent played with popcnt => it could benefit from the "const to 
var" too.

So I played around a bit...

Of course, all this is intel only....

1)
var
   Mask8, Mask1: qword;
....
   Mask8 := EIGHTYMASK;
   Mask1 := ONEMASK;

And the constant no longer is assigned inside the loop.
Also makes the loop  shorter.

=> improves speed

2)
   //for i := 1 to (ByteCount-cnt) div sizeof(PtrInt) do
   //for i := (ByteCount-cnt) div sizeof(PtrInt) - 1 downto 0 do
   i := (ByteCount-cnt) div sizeof(PtrInt) ;
   repeat
     ....
   dec(i);
   until i = 0;


The orig:
   for i := 1 to (ByteCount-cnt) div sizeof(PtrInt) do
// r9 is reserved to hold the upper bound
.Lj26:
     addq    $1,%r10
...
     cmpq    %r10,%r9
     jnle    .Lj26

Since the counter var "i" is not accessed in the loop, its value does 
not matter.
So
   for i := (ByteCount-cnt) div sizeof(PtrInt) - 1 downto 0 do
// no longer needs to store an upper bound, but still has an extra 
"test" since the "subq" is at the start of the loop
.Lj26:
     subq    $1,%r10
...
     testq    %r10,%r10
     jnle    .Lj26

// "repeat " , and there no longer is a "test"
And that again reduced the loop size.
And apparently just below a critical point, as time get a little better 
again

-------------
WITH the constants moved to var:
orig for : 547
downto:547
repeat: 516



More information about the fpc-devel mailing list