[fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

Martin Frb lazarus at mfriebe.de
Tue Jan 4 17:03:24 CET 2022

@Marco: havent played with popcnt => it could benefit from the "const to 
var" too.

So I played around a bit...

Of course, all this is intel only....

   Mask8, Mask1: qword;
   Mask8 := EIGHTYMASK;
   Mask1 := ONEMASK;

And the constant no longer is assigned inside the loop.
Also makes the loop  shorter.

=> improves speed

   //for i := 1 to (ByteCount-cnt) div sizeof(PtrInt) do
   //for i := (ByteCount-cnt) div sizeof(PtrInt) - 1 downto 0 do
   i := (ByteCount-cnt) div sizeof(PtrInt) ;
   until i = 0;

The orig:
   for i := 1 to (ByteCount-cnt) div sizeof(PtrInt) do
// r9 is reserved to hold the upper bound
     addq    $1,%r10
     cmpq    %r10,%r9
     jnle    .Lj26

Since the counter var "i" is not accessed in the loop, its value does 
not matter.
   for i := (ByteCount-cnt) div sizeof(PtrInt) - 1 downto 0 do
// no longer needs to store an upper bound, but still has an extra 
"test" since the "subq" is at the start of the loop
     subq    $1,%r10
     testq    %r10,%r10
     jnle    .Lj26

// "repeat " , and there no longer is a "test"
And that again reduced the loop size.
And apparently just below a critical point, as time get a little better 

WITH the constants moved to var:
orig for : 547
repeat: 516

More information about the fpc-devel mailing list