[fpc-devel] x86_64/amd64 asmcse and peephole optimizer port

Matthias K. makadev at googlemail.com
Sat Oct 30 13:20:48 CEST 2010


Hello,

the last days I've done a first step in Porting the i386 data flow
analyzer, asmcse and  peephole optimizations.
Main motivation is: target instruction level optimization is always a
good thing especially for bottlenecks.

The main target was: porting the i386 optimization part to x86_64
(amd64) and merging it back such that generic x86 optimization is in
one place.
This is currently not complete, i didn't merge it back since there is
still testing and review todo. But from the current point of view it
should be rather simple to to merge the data flow analysis and the
asmcse parts. The peephole part is another point, that should be pure
cpu/target specific.

Like I stated above, the current approach needs further testing (fpc
testsuite returns same result for patched and unpatched compiler with
"make full", but there may be things missing) and review from others
(hopefully with more knowledge about the x86_64 code generator part
and potential optimizations). Thats why I'm attaching my current
approach here.

File contents (diffs based on svn r16213 trunk, apply in ascending
number order):
  a64opt_52.diff - same as copying
    compiler/i386/aopt386.pas into compiler/x86_64/aopta64.pas
    compiler/i386/optcsopt386.pas into compiler/x86_64/optcsopta64.pas
    compiler/i386/daopt386.pas into compiler/x86_64/daopta64.pas
    compiler/i386/popt386.pas into compiler/x86_64/popta64.pas
    compiler/i386/rropt386.pas into compiler/x86_64/rropta64.pas

  a64opt_53.diff - renames unit names and enables a64opta in psub for x84_64
  -> broken compilation since its not complete

  a64opt_59.diff - does a lot of renaming, mostly RS_E** into RS_R**
and other names, introduces R8..R15 registers in most checks, comments
out instruction specific peephole opts, extends some sizes and checks
for S_Q opsize
   -> broken compilation

  a64opt_72.diff - fixes bugs from above, adds peephole to level1
opts, asmcse to level 2 opts, enables some optimizations for
A_SHL/A_SAR and modified A_AND (see and_test_com.txt for and/test
combinations)

  a64opt_78.diff - enables modified "small const imul to lea
alternative" (see opt_const_imul.txt for note, and software
optimization guide for athlon64 and 10h), removes certain
optimizations that are not used (empirical.. tested with rtl/compiler
code)
  a64opt_79.diff - enables modified "mov alternatives", adds A_MOVSDX
at certain places
  a64opt_80.diff - enables mov sequence alternatives

  *.pas and sources/*.s_* are some small tests and generated asm for
the tests, with <compilernum>_O3 suffix, 0 is my current x86_64 (svn
r16213) compiler and 80 is the patched one.

TODO: There is potential for further optimizations, especially for x87
and 128bit Media/XOP/FM4.. but the code needs some cleanups before and
possibly some bug fixes

I'm open for any feedback, bugfixes and so on (and if it should be
merged with i386 parts)

bye,
  Matthias Karbe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: aopt64.zip
Type: application/zip
Size: 120637 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20101030/41fe2e86/attachment.zip>


More information about the fpc-devel mailing list