[fpc-devel] Bitset assembler

Jonas Maebe jonas.maebe at elis.ugent.be
Sun Sep 11 15:23:14 CEST 2016


On 11/09/16 15:11, Jeppe Johansen wrote:
> Here's an ARM version that runs in 5 cycles on a Cortex A8:
>     mov    r2,r1,lsr #5
>     mov    r12,#1
>     ldr    r3,[r0, r2, lsl #2]!
>     orr    r2,r3,r12,lsl r1
>     str    r2,[r0]
>     and    r0,r12,r3,lsr r1
>
> It's one cycle faster than what the compiler can generate due to it not
> doing the pre-indexed writeback optimization when the address
> calculation has shifts.

Given that this code will be in an non-inlinable routine (we can't 
inline routines with inline assembler), the Pascal version is probably 
faster then (since you won't have the call/return overhead).


Jonas




More information about the fpc-devel mailing list