[fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt
Marco van de Voort
fpc at pascalprogramming.org
Tue Jan 4 18:01:12 CET 2022
On 4-1-2022 17:15, J. Gareth Moreton via fpc-devel wrote:
> I neglected to include -Cpcoreavx, that was my bad. I'll try again.
>
> According to Intel® 64 and IA-32 Architectures Software Developer’s
> Manual, Vol 2B, Page 4-391. The zero flag is set if the source is
> zero, and cleared otherwise. Regarding an undefined result, I got
> confused with the BSF and BSR commands, sorry. I guess I was more
> tired than I thought! POPCNT returns zero for a zero input.
Ok, that's what I thought.
I played a bit by adding code alignments to loops in the SSE code, but
it only seems to slow the core loop rather than accelerate it (align
before the branch location and/or branch target)
Did you have any thoughts about moving up the NOT instruction ?
More information about the fpc-devel
mailing list