[fpc-devel] Attn: J. Gareth // 3.3.1 opt = slower // Fwd: [Lazarus] Faster than popcnt
    Marco van de Voort 
    fpc at pascalprogramming.org
       
    Tue Jan  4 18:01:12 CET 2022
    
    
  
On 4-1-2022 17:15, J. Gareth Moreton via fpc-devel wrote:
> I neglected to include -Cpcoreavx, that was my bad.  I'll try again.
>
> According to Intel® 64 and IA-32 Architectures Software Developer’s 
> Manual, Vol 2B, Page 4-391.  The zero flag is set if the source is 
> zero, and cleared otherwise.  Regarding an undefined result, I got 
> confused with the BSF and BSR commands, sorry.  I guess I was more 
> tired than I thought!  POPCNT returns zero for a zero input.
Ok, that's what I thought.
I played a bit by adding code alignments to loops in the SSE code, but 
it only seems to slow the core loop rather than accelerate it (align 
before the branch location and/or branch target)
Did you have any thoughts about moving up the NOT instruction ?
    
    
More information about the fpc-devel
mailing list