[fpc-devel] x86_64.inc CompareByte

Florian Klämpfl florian at freepascal.org
Mon Oct 16 22:41:17 CEST 2017


Am 16.10.2017 um 22:33 schrieb Markus Beth:
> Sorry for the late reply. I had a weekend off(line).
> 
> The instructions were chosen on purpose and Sergey already cited the part of the Intel documentation
> that explains why this is correct. You can find a similar part in AMD "AMD64 Architecture
> Programmer’s Manual Volume 1: Application Programming":

Yes, Sergey is of course right, it was too late yesterday :)

> 
>> 3.4.5 High 32 Bits
>> In 64-bit mode, the following rules apply to extension of results into
>> the high 32 bits when results smaller than 64 bits are written:
>>
>> * Zero-Extension of 32-Bit Results: 32-bit results are zero-extended
>>   into the high 32 bits of 64-bit GPR destination registers.
> 
> I think other x86_64 CPU manufacturers also adhere to this rule as I know gcc also relies on this.
> 
> I generally prefer the instructions operating on 32 bit operands over those operating on 64 bit
> operands where appropriate because they are typically encoded in less bytes as they do not need a
> REX prefix.
> 
> I have updated the patch (attached) to include a code path for 'oldbinutils' as Gareth suggested. In
> addition I switched the tails (.LCmpbyteZero and .LCmpbyteExitFast) as when we leave the loop
> because the loop count reaches zero, we know already that the last bytes were the same and do not
> need to subq them.
> 
> Markus
> 
> P.S.: I am currently working on another version of CompareByte that might have a slightly higher
> latency for very small len but a higher throughput (2 cycles per iteration vs. 3 cycles on an Intel
> Arrandale CPU (Westmere microarchitecture)). But this would need some more testing and benchmarking.
> I can come up with it here again if this would be of any interest.

Small lengths in terms of matching string or overall lengths?

BTW: I would really like to see a PCMPSTR based implementation :)




More information about the fpc-devel mailing list