[fpc-devel] x86_64 question

J. Gareth Moreton gareth at moreton-family.com
Fri Oct 16 05:14:03 CEST 2020


Hi Nikolay,

I've simplified my test as much as I can, and hopefully I have something 
that properly tests whether TEST has a false dependency or not.  I'm 
willing to admit that I may have been mistaken and the slowdown was 
caused by something else.

The test functions effectively do a population count on the lowest 8 
bits of a 32-bit integer... very inefficiently by calling TEST on each 
of the 8 bits in turn and adding to a running total! I also have a 
handful of versions that call POPCNT for comparison (needless to say it 
wipes the floor with the TEST versions).

The program appears to run correctly on Win64 and Linux64, although I 
can't vouch for the performance on Linux because of the overhead caused 
by my copy being on a virtual machine.  The program does some extensive 
error checking on the results generated, and so far they haven't thrown 
anything up.

Let me know how it goes.

Gareth aka. Kit

On 05/10/2020 14:39, Nikolay Nikolov via fpc-devel wrote:
>
> On 10/4/20 2:01 PM, J. Gareth Moreton via fpc-devel wrote:
>> Hi Nikolay,
>>
>> I've got some good code to test, but I need to double-check with 
>> someone to see if the licensing agreements allow (the code is rather 
>> complex, but showcases the effect of the TEST instructions quite 
>> nicely).
>>
>> Is your platform a Windows or a Unix machine?  I ask because I don't 
>> want to send you functions that use the wrong calling convention!
>
> I dual boot Linux and Windows, but prefer testing on Linux.
>
> Best regards,
>
>
> Nikolay
>
>>
>> Gareth aka. Kit
>>
>> On 02/10/2020 14:13, Nikolay Nikolov via fpc-devel wrote:
>>>
>>> On 10/2/20 2:13 PM, J. Gareth Moreton via fpc-devel wrote:
>>>> Confirmed my suspicions.  if I zero the upper bits of the register 
>>>> (I used something akin to "AND RCX, $F"), there is no speed loss.
>>>>
>>>> Therefore, I can make the hypothesis, on my Intel(R) Core(TM) 
>>>> i7-10750H, that using TEST on a sub-register causes a false 
>>>> dependency if the bits outside of the subset are not zero, even 
>>>> though the register isn't being modified.
>>>
>>> If you send me a test program, I can run it on my Ryzen 5 2500U to 
>>> see how AMD behaves. We don't specifically optimize for AMD (yet), 
>>> but it's interesting to know.
>>>
>>> Nikolay
>>>
>>>>
>>>> Gareth aka. Kit
>>>>
>>>> On 02/10/2020 11:57, J. Gareth Moreton via fpc-devel wrote:
>>>>> So... I've done some tests, replacing TEST RCX, $4 with TEST CL, 
>>>>> $4 and the like in a number-crunching function, and it seems to 
>>>>> cause a notable penalty, even though none of the instructions are 
>>>>> in my critical loop.  So I think it's something that needs to be 
>>>>> avoided in most cases.  I think the reason why it worked in my Int 
>>>>> and Frac functions is because the processor knows the upper 48 
>>>>> bits of the register are zero.
>>>>>
>>>>> Long story short... best not to do it unless you have some 
>>>>> additional insight into what the registers contain.
>>>>>
>>>>> Gareth aka. Kit
>>>>>
>>>>>
>>> _______________________________________________
>>> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
>>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>>>
>>
> _______________________________________________
> fpc-devel maillist  -  fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>


-- 
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus
-------------- next part --------------
A non-text attachment was scrubbed...
Name: deptest.zip
Type: application/x-zip-compressed
Size: 3415 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20201016/bd82cba3/attachment.bin>


More information about the fpc-devel mailing list