[fpc-devel] x86_64 question
J. Gareth Moreton
gareth at moreton-family.com
Fri Oct 16 05:14:03 CEST 2020
I've simplified my test as much as I can, and hopefully I have something
that properly tests whether TEST has a false dependency or not. I'm
willing to admit that I may have been mistaken and the slowdown was
caused by something else.
The test functions effectively do a population count on the lowest 8
bits of a 32-bit integer... very inefficiently by calling TEST on each
of the 8 bits in turn and adding to a running total! I also have a
handful of versions that call POPCNT for comparison (needless to say it
wipes the floor with the TEST versions).
The program appears to run correctly on Win64 and Linux64, although I
can't vouch for the performance on Linux because of the overhead caused
by my copy being on a virtual machine. The program does some extensive
error checking on the results generated, and so far they haven't thrown
Let me know how it goes.
Gareth aka. Kit
On 05/10/2020 14:39, Nikolay Nikolov via fpc-devel wrote:
> On 10/4/20 2:01 PM, J. Gareth Moreton via fpc-devel wrote:
>> Hi Nikolay,
>> I've got some good code to test, but I need to double-check with
>> someone to see if the licensing agreements allow (the code is rather
>> complex, but showcases the effect of the TEST instructions quite
>> Is your platform a Windows or a Unix machine? I ask because I don't
>> want to send you functions that use the wrong calling convention!
> I dual boot Linux and Windows, but prefer testing on Linux.
> Best regards,
>> Gareth aka. Kit
>> On 02/10/2020 14:13, Nikolay Nikolov via fpc-devel wrote:
>>> On 10/2/20 2:13 PM, J. Gareth Moreton via fpc-devel wrote:
>>>> Confirmed my suspicions. if I zero the upper bits of the register
>>>> (I used something akin to "AND RCX, $F"), there is no speed loss.
>>>> Therefore, I can make the hypothesis, on my Intel(R) Core(TM)
>>>> i7-10750H, that using TEST on a sub-register causes a false
>>>> dependency if the bits outside of the subset are not zero, even
>>>> though the register isn't being modified.
>>> If you send me a test program, I can run it on my Ryzen 5 2500U to
>>> see how AMD behaves. We don't specifically optimize for AMD (yet),
>>> but it's interesting to know.
>>>> Gareth aka. Kit
>>>> On 02/10/2020 11:57, J. Gareth Moreton via fpc-devel wrote:
>>>>> So... I've done some tests, replacing TEST RCX, $4 with TEST CL,
>>>>> $4 and the like in a number-crunching function, and it seems to
>>>>> cause a notable penalty, even though none of the instructions are
>>>>> in my critical loop. So I think it's something that needs to be
>>>>> avoided in most cases. I think the reason why it worked in my Int
>>>>> and Frac functions is because the processor knows the upper 48
>>>>> bits of the register are zero.
>>>>> Long story short... best not to do it unless you have some
>>>>> additional insight into what the registers contain.
>>>>> Gareth aka. Kit
>>> fpc-devel maillist - fpc-devel at lists.freepascal.org
> fpc-devel maillist - fpc-devel at lists.freepascal.org
This email has been checked for viruses by Avast antivirus software.
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 3415 bytes
Desc: not available
More information about the fpc-devel