[fpc-devel] ARM: AND/CMP -> TST optimisation produces incorrect results
J. Gareth Moreton
gareth at moreton-family.com
Wed Feb 28 16:14:53 CET 2024
Hi Garry,
Hopefully I have fixed this issue now, which is also causing problems
elsewhere.
https://gitlab.com/freepascal.org/fpc/source/-/merge_requests/598 - just
waiting on it to be verified, approved and merged.
Gareth aka. Kit
On 20/02/2024 06:32, J. Gareth Moreton via fpc-devel wrote:
>
> Thanks for the report and especially your investigative work. Ii'll
> take a look to see what's going on.
>
> Gareth aka. Kit
>
> On 20/02/2024 01:30, Garry Wood via fpc-devel wrote:
>>
>> Hello,
>>
>> Commit 6b2e4fa4 (main) entitled “* arm: "OpCmp2OpS" moved to Pass 2
>> so it doesn't conflict with AND; CMP -> TST optimisation” by Gareth
>> from Feb 11 2024 produces incorrect assembler in certain cases.
>>
>> https://gitlab.com/freepascal.org/fpc/source/-/commit/6b2e4fa4133a496c1c3f89e3c71fffbdd7c192fb
>>
>> This piece of code:
>>
>> function CPUMaskCount(CPUMask:LongWord):LongWord;
>>
>> var
>>
>> Count:LongWord;
>>
>> begin
>>
>> {}
>>
>> Result:=0;
>>
>> for Count:=CPU_ID_0 to CPU_ID_MAX do
>>
>> begin
>>
>> if (CPUMask and (1 shl Count)) <> 0 then
>>
>> begin
>>
>> Inc(Result);
>>
>> end;
>>
>> end;
>>
>> end;
>>
>> when compiled with FPC prior to commit 6b2e4fa4 produces the
>> following working assembler:
>>
>> 00020528 <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD>:
>>
>> 20528: e1a01000 mov r1, r0
>>
>> 2052c: e3a00000 mov r0, #0
>>
>> 20530: e3a02000 mov r2, #0
>>
>> 20534: e3a03001 mov r3, #1
>>
>> 20538: e0113213 ands r3, r1, r3, lsl r2
>>
>> 2053c: 12800001 addne r0, r0, #1
>>
>> 20540: e2822001 add r2, r2, #1
>>
>> 20544: e352001f cmp r2, #31
>>
>> 20548: 9afffff9 bls 20534
>> <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD+0xc>
>>
>> 2054c: e12fff1e bx lr
>>
>> But when compiled with FPC after commit 6b2e4fa4 it produces this
>> assembler which doesn’t work:
>>
>> 00020528 <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD>:
>>
>> 20528: e1a01000 mov r1, r0
>>
>> 2052c: e3a00000 mov r0, #0
>>
>> 20530: e3a02000 mov r2, #0
>>
>> 20534: e3a03001 mov r3, #1
>>
>> 20538: e1110003 tst r1, r3
>>
>> 2053c: 12800001 addne r0, r0, #1
>>
>> 20540: e2822001 add r2, r2, #1
>>
>> 20544: e352001f cmp r2, #31
>>
>> 20548: 9afffff9 bls 20534
>> <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD+0xc>
>>
>> 2054c: e12fff1e bx lr
>>
>> You can see that the difference is the lack of lsl r2 on the end of
>> the TST instruction which means that the shl on the original code is
>> not being performed and the test is therefore invalid.
>>
>> Similar code sequences in multiple other places produce the same
>> result with the lsl suffix missing from the TST instruction.
>>
>> Please let me know if you need any further information.
>>
>> Garry Wood.
>>
>>
>> _______________________________________________
>> fpc-devel maillist -fpc-devel at lists.freepascal.org
>> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
>
> _______________________________________________
> fpc-devel maillist -fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20240228/9cd06dad/attachment.htm>
More information about the fpc-devel
mailing list