[fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt

Marco van de Voort fpc at pascalprogramming.org
Tue Jan 4 18:01:12 CET 2022


On 4-1-2022 17:15, J. Gareth Moreton via fpc-devel wrote:
> I neglected to include -Cpcoreavx, that was my bad.  I'll try again.
>
> According to Intel® 64 and IA-32 Architectures Software Developer’s 
> Manual, Vol 2B, Page 4-391.  The zero flag is set if the source is 
> zero, and cleared otherwise.  Regarding an undefined result, I got 
> confused with the BSF and BSR commands, sorry.  I guess I was more 
> tired than I thought!  POPCNT returns zero for a zero input.

Ok, that's what I thought.

I played a bit by adding code alignments to loops in the SSE code, but 
it only seems to slow the core loop rather than accelerate it (align 
before the branch location and/or branch target)

Did you have any thoughts about moving up the NOT instruction ?


More information about the fpc-devel mailing list