[fpc-devel] x86_64.inc CompareByte

Markus Beth markus.beth at zkrd.de
Sat Oct 21 01:24:45 CEST 2017


Find attached the already announced version of CompareByte.

BTW: If you really like to see a PCMPSTR based implementation, have a
look at Agner Fog's Subroutine library asmlib.zip
(http://agner.org/optimize/).


On 16.10.2017 23:08, Markus Beth wrote:
> On 16.10.2017 22:41, Florian Klämpfl wrote:
>>> P.S.: I am currently working on another version of CompareByte that 
>>> might have a slightly higher
>>> latency for very small len but a higher throughput (2 cycles per 
>>> iteration vs. 3 cycles on an Intel
>>> Arrandale CPU (Westmere microarchitecture)). But this would need some 
>>> more testing and benchmarking.
>>> I can come up with it here again if this would be of any interest.
>>
>> Small lengths in terms of matching string or overall lengths?
> 
> It is small length in terms of matching string as there is some setup 
> work before the loop.
> 
>> BTW: I would really like to see a PCMPSTR based implementation :)
> PCMPSTR is (at the moment) out of my scope. I thought PCMPSTR is part of 
> SSE4.2. How would you deal with Intel core microarchitecture CPUs that 
> don't have it?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: x86_64_comparebyte3.patch
Type: text/x-patch
Size: 1343 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20171021/340403bc/attachment.bin>


More information about the fpc-devel mailing list