[fpc-devel] ARM: AND/CMP -> TST optimisation produces incorrect results
J. Gareth Moreton
gareth at moreton-family.com
Tue Feb 20 07:32:31 CET 2024
Thanks for the report and especially your investigative work. Ii'll take
a look to see what's going on.
Gareth aka. Kit
On 20/02/2024 01:30, Garry Wood via fpc-devel wrote:
>
> Hello,
>
> Commit 6b2e4fa4 (main) entitled “* arm: "OpCmp2OpS" moved to Pass 2 so
> it doesn't conflict with AND; CMP -> TST optimisation” by Gareth from
> Feb 11 2024 produces incorrect assembler in certain cases.
>
> https://gitlab.com/freepascal.org/fpc/source/-/commit/6b2e4fa4133a496c1c3f89e3c71fffbdd7c192fb
>
> This piece of code:
>
> function CPUMaskCount(CPUMask:LongWord):LongWord;
>
> var
>
> Count:LongWord;
>
> begin
>
> {}
>
> Result:=0;
>
> for Count:=CPU_ID_0 to CPU_ID_MAX do
>
> begin
>
> if (CPUMask and (1 shl Count)) <> 0 then
>
> begin
>
> Inc(Result);
>
> end;
>
> end;
>
> end;
>
> when compiled with FPC prior to commit 6b2e4fa4 produces the following
> working assembler:
>
> 00020528 <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD>:
>
> 20528: e1a01000 mov r1, r0
>
> 2052c: e3a00000 mov r0, #0
>
> 20530: e3a02000 mov r2, #0
>
> 20534: e3a03001 mov r3, #1
>
> 20538: e0113213 ands r3, r1, r3, lsl r2
>
> 2053c: 12800001 addne r0, r0, #1
>
> 20540: e2822001 add r2, r2, #1
>
> 20544: e352001f cmp r2, #31
>
> 20548: 9afffff9 bls 20534
> <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD+0xc>
>
> 2054c: e12fff1e bx lr
>
> But when compiled with FPC after commit 6b2e4fa4 it produces this
> assembler which doesn’t work:
>
> 00020528 <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD>:
>
> 20528: e1a01000 mov r1, r0
>
> 2052c: e3a00000 mov r0, #0
>
> 20530: e3a02000 mov r2, #0
>
> 20534: e3a03001 mov r3, #1
>
> 20538: e1110003 tst r1, r3
>
> 2053c: 12800001 addne r0, r0, #1
>
> 20540: e2822001 add r2, r2, #1
>
> 20544: e352001f cmp r2, #31
>
> 20548: 9afffff9 bls 20534
> <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD+0xc>
>
> 2054c: e12fff1e bx lr
>
> You can see that the difference is the lack of lsl r2 on the end of
> the TST instruction which means that the shl on the original code is
> not being performed and the test is therefore invalid.
>
> Similar code sequences in multiple other places produce the same
> result with the lsl suffix missing from the TST instruction.
>
> Please let me know if you need any further information.
>
> Garry Wood.
>
>
> _______________________________________________
> fpc-devel maillist -fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20240220/420e65d3/attachment-0001.htm>
More information about the fpc-devel
mailing list