[fpc-devel] x86_64.inc CompareByte

Florian Klämpfl florian at freepascal.org
Tue Oct 17 19:45:27 CEST 2017


Am 16.10.2017 um 23:08 schrieb Markus Beth:
> On 16.10.2017 22:41, Florian Klämpfl wrote:
>>> P.S.: I am currently working on another version of CompareByte that might have a slightly higher
>>> latency for very small len but a higher throughput (2 cycles per iteration vs. 3 cycles on an Intel
>>> Arrandale CPU (Westmere microarchitecture)). But this would need some more testing and benchmarking.
>>> I can come up with it here again if this would be of any interest.
>>
>> Small lengths in terms of matching string or overall lengths?
> 
> It is small length in terms of matching string as there is some setup work before the loop.
> 
>> BTW: I would really like to see a PCMPSTR based implementation :)
> PCMPSTR is (at the moment) out of my scope. I thought PCMPSTR is part of SSE4.2. How would you deal
> with Intel core microarchitecture CPUs that don't have it?

Just set a flag at startup if it is supported and then branch on the flag. As the flag never
changes, branch prediction most likely will work very good.




More information about the fpc-devel mailing list