[fpc-devel] ARM: AND/CMP -> TST optimisation produces incorrect results

Garry Wood garry at softoz.com.au
Tue Feb 20 02:30:58 CET 2024


Hello,

Commit 6b2e4fa4 (main) entitled "* arm: "OpCmp2OpS" moved to Pass 2 so it doesn't conflict with AND; CMP -> TST optimisation" by Gareth from Feb 11 2024 produces incorrect assembler in certain cases.

https://gitlab.com/freepascal.org/fpc/source/-/commit/6b2e4fa4133a496c1c3f89e3c71fffbdd7c192fb


This piece of code:

function CPUMaskCount(CPUMask:LongWord):LongWord;
var
Count:LongWord;
begin
{}
Result:=0;
 for Count:=CPU_ID_0 to CPU_ID_MAX do
  begin
   if (CPUMask and (1 shl Count)) <> 0 then
    begin
     Inc(Result);
    end;
  end;
end;

when compiled with FPC prior to commit 6b2e4fa4 produces the following working assembler:

00020528 <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD>:
   20528:              e1a01000            mov       r1, r0
   2052c:               e3a00000            mov       r0, #0
   20530:              e3a02000            mov       r2, #0
   20534:              e3a03001            mov       r3, #1
   20538:              e0113213           ands      r3, r1, r3, lsl r2
   2053c:               12800001           addne   r0, r0, #1
   20540:              e2822001           add        r2, r2, #1
   20544:              e352001f            cmp       r2, #31
   20548:              9afffff9 bls          20534 <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD+0xc>
   2054c:               e12fff1e               bx           lr


But when compiled with FPC after commit 6b2e4fa4 it produces this assembler which doesn't work:

00020528 <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD>:
   20528:              e1a01000            mov       r1, r0
   2052c:               e3a00000            mov       r0, #0
   20530:              e3a02000            mov       r2, #0
   20534:              e3a03001            mov       r3, #1
   20538:              e1110003           tst           r1, r3
   2053c:               12800001           addne   r0, r0, #1
   20540:              e2822001           add        r2, r2, #1
   20544:              e352001f            cmp       r2, #31
   20548:              9afffff9 bls          20534 <GLOBALCONFIG_$$_CPUMASKCOUNT$LONGWORD$$LONGWORD+0xc>
   2054c:               e12fff1e               bx           lr

You can see that the difference is the lack of lsl r2 on the end of the TST instruction which means that the shl on the original code is not being performed and the test is therefore invalid.

Similar code sequences in multiple other places produce the same result with the lsl suffix missing from the TST instruction.

Please let me know if you need any further information.

Garry Wood.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20240220/d1c4c0d3/attachment.htm>


More information about the fpc-devel mailing list