[fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt
lazarus at mfriebe.de
Tue Jan 4 17:03:24 CET 2022
@Marco: havent played with popcnt => it could benefit from the "const to
So I played around a bit...
Of course, all this is intel only....
Mask8, Mask1: qword;
Mask8 := EIGHTYMASK;
Mask1 := ONEMASK;
And the constant no longer is assigned inside the loop.
Also makes the loop shorter.
=> improves speed
//for i := 1 to (ByteCount-cnt) div sizeof(PtrInt) do
//for i := (ByteCount-cnt) div sizeof(PtrInt) - 1 downto 0 do
i := (ByteCount-cnt) div sizeof(PtrInt) ;
until i = 0;
for i := 1 to (ByteCount-cnt) div sizeof(PtrInt) do
// r9 is reserved to hold the upper bound
Since the counter var "i" is not accessed in the loop, its value does
for i := (ByteCount-cnt) div sizeof(PtrInt) - 1 downto 0 do
// no longer needs to store an upper bound, but still has an extra
"test" since the "subq" is at the start of the loop
// "repeat " , and there no longer is a "test"
And that again reduced the loop size.
And apparently just below a critical point, as time get a little better
WITH the constants moved to var:
orig for : 547
More information about the fpc-devel