[fpc-devel] Bitset assembler
jonas.maebe at elis.ugent.be
Sun Sep 11 15:23:14 CEST 2016
On 11/09/16 15:11, Jeppe Johansen wrote:
> Here's an ARM version that runs in 5 cycles on a Cortex A8:
> mov r2,r1,lsr #5
> mov r12,#1
> ldr r3,[r0, r2, lsl #2]!
> orr r2,r3,r12,lsl r1
> str r2,[r0]
> and r0,r12,r3,lsr r1
> It's one cycle faster than what the compiler can generate due to it not
> doing the pre-indexed writeback optimization when the address
> calculation has shifts.
Given that this code will be in an non-inlinable routine (we can't
inline routines with inline assembler), the Pascal version is probably
faster then (since you won't have the call/return overhead).
More information about the fpc-devel