[fpc-devel] Fwd: Re: ARM/AARCH64 work
J. Gareth Moreton
gareth at moreton-family.com
Thu Aug 6 19:04:25 CEST 2020
(Accidentally sent to FLorian privately instead of to the mailing list)
> > Obviously I'll submit the changes as patches so
> they can be properly reviewed and tested, but does
> > this sound like a good idea?
> Yes, you might have seen, that I started already with this some time ago.
Well, that's good - means I can get straight to work and save you some time, hopefully! AARCH64
doesn't have that much that needs changing in that regard, although ARM needs a lot of clean-up that
I'm getting underway with. There's potential to merge some ARM/AARCH64 optimisations too, which
differ only in the register names (e.g. x29 instead of r13).
I've done a little bit of refactoring of the individual optimisations to allow for small speed-ups.
One of the main ones is an optimisation that converts 4 instructions onto one - in the trunk, it
calls GetNextInstruction three times, along with SkipEntryExitMarker a couple of times, then checks
to see if the individual operators and their operands permit the optimisation. I've changed this
around a little bit so that the first instruction is evaluated, and only then is GetNextInstruction
called so the next instruction can be checked, given that GetNextInstruction is a relatively
expensive call and it's more likely that the criteria for the optimisation won't be met (e.g. it
comes across a different instruction), so the sooner you can detect this and drop out, the faster
the Peephole Optimizer will run.
> > P.S. While I haven't been asked to improve
> aarch64-linux specifically, if I'm understanding things
> > correctly, there should be very few differences
> with the actual target platform in regards to
> > calling conventions, for example.
> You mean with regard to different aarch64 platforms?
Yes, apologies. I mean in regards to different aarch64 platforms. I've got the basics of the
calling convention down, like with passing the first integral parameter through r/x0 and
In a funny way, my x86-64 machine breaking is proving to be a blessing in disguise, since I'm now
exploring a completely different architecture and learning its assembly language. Some potential
speed-ups, like utilising NEON and other SIMD instruction sets, fall under the more general
philosophy of vectorization and would be much easier to perform once all the nuances with intrinsics
are resolved, because vectorization is much easier to do with nodes than with, say, trying to take a
bunch of assembler instructions and attempting to vectorize those.
Gareth aka. Kit
More information about the fpc-devel