[fpc-devel] x86_64.inc CompareByte

Markus Beth markus.beth at zkrd.de
Mon Oct 16 23:08:12 CEST 2017


On 16.10.2017 22:41, Florian Klämpfl wrote:
>> P.S.: I am currently working on another version of CompareByte that might have a slightly higher
>> latency for very small len but a higher throughput (2 cycles per iteration vs. 3 cycles on an Intel
>> Arrandale CPU (Westmere microarchitecture)). But this would need some more testing and benchmarking.
>> I can come up with it here again if this would be of any interest.
> 
> Small lengths in terms of matching string or overall lengths?

It is small length in terms of matching string as there is some setup 
work before the loop.

> BTW: I would really like to see a PCMPSTR based implementation :)
PCMPSTR is (at the moment) out of my scope. I thought PCMPSTR is part of 
SSE4.2. How would you deal with Intel core microarchitecture CPUs that 
don't have it?



More information about the fpc-devel mailing list