[fpc-devel] ARM: AND/CMP -> TST optimisation produces incorrect results

J. Gareth Moreton gareth at moreton-family.com
Tue Feb 20 07:32:31 CET 2024


Thanks for the report and especially your investigative work. Ii'll take 
a look to see what's going on.

Gareth aka. Kit

On 20/02/2024 01:30, Garry Wood via fpc-devel wrote:
>
> Hello,
>
> Commit 6b2e4fa4 (main) entitled “* arm: "OpCmp2OpS" moved to Pass 2 so 
> it doesn't conflict with AND; CMP -> TST optimisation” by Gareth from 
> Feb 11 2024 produces incorrect assembler in certain cases.
>
> https://gitlab.com/freepascal.org/fpc/source/-/commit/6b2e4fa4133a496c1c3f89e3c71fffbdd7c192fb
>
> This piece of code:
>
> function CPUMaskCount(CPUMask:LongWord):LongWord;
>
> var
>
> Count:LongWord;
>
> begin
>
> {}
>
> Result:=0;
>
>  for Count:=CPU_ID_0 to CPU_ID_MAX do
>
>   begin
>
>    if (CPUMask and (1 shl Count)) <> 0 then
>
>     begin
>
>      Inc(Result);
>
>     end;
>
>   end;
>
> end;
>
> when compiled with FPC prior to commit 6b2e4fa4 produces the following 
> working assembler:
>
> 00020528 <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD>:
>
>    20528: e1a01000            mov       r1, r0
>
>    2052c: e3a00000            mov       r0, #0
>
>    20530: e3a02000            mov       r2, #0
>
>    20534: e3a03001            mov       r3, #1
>
>    20538: e0113213           ands      r3, r1, r3, lsl r2
>
>    2053c: 12800001           addne   r0, r0, #1
>
>    20540: e2822001           add        r2, r2, #1
>
>    20544: e352001f            cmp       r2, #31
>
>    20548: 9afffff9 bls          20534 
> <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD+0xc>
>
>    2054c: e12fff1e               bx           lr
>
> But when compiled with FPC after commit 6b2e4fa4 it produces this 
> assembler which doesn’t work:
>
> 00020528 <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD>:
>
>    20528: e1a01000            mov       r1, r0
>
>    2052c: e3a00000            mov       r0, #0
>
>    20530: e3a02000            mov       r2, #0
>
>    20534: e3a03001            mov       r3, #1
>
>    20538: e1110003           tst           r1, r3
>
>    2053c: 12800001           addne   r0, r0, #1
>
>    20540: e2822001           add        r2, r2, #1
>
>    20544: e352001f            cmp       r2, #31
>
>    20548: 9afffff9 bls          20534 
> <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD+0xc>
>
>    2054c: e12fff1e               bx           lr
>
> You can see that the difference is the lack of lsl r2 on the end of 
> the TST instruction which means that the shl on the original code is 
> not being performed and the test is therefore invalid.
>
> Similar code sequences in multiple other places produce the same 
> result with the lsl suffix missing from the TST instruction.
>
> Please let me know if you need any further information.
>
> Garry Wood.
>
>
> _______________________________________________
> fpc-devel maillist  -fpc-devel at lists.freepascal.org
> https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20240220/420e65d3/attachment-0001.htm>


More information about the fpc-devel mailing list