[fpc-devel] Good timing metric test program?
J. Gareth Moreton
gareth at moreton-family.com
Tue Feb 26 02:32:14 CET 2019
Well, compiling Lazarus is what I've been doing to test the compiler's
speed, and I've got some promising results:
https://bugs.freepascal.org/view.php?id=34628#c114453
Though the speed of the runs varies a lot depending on what my system is
doing, especially when I switch back and forth between my code and the
unmodified trunk, I get about a 15% speed gain in the compiler and a small
size saving too, mostly due to overhauled jump optimisations.
When it comes to the metric test program, the best comparison I can think
of are those fancy benchmark programs used to test graphics cards and spit
out a score. Compiling Lazarus is good and all, but you can't easily
determine if its compiled code is any more efficient than before, outside
of painstakingly studying the disassembly side-by-side with the control
case. Saying all that, it might be an incentive to design such a test
program that does a number of different operations like multiplying a
vector array by a matrix (this would a good test case for vectorisation),
generating prime numbers using a Sieve of Eratosthenes (would test array
polling) and converting integers into different bases (tests to see how
well the compiler can deal with div and mod instructions, especially as,
currently, the compiler isn't smart enough to combine the operations if the
two appear together, since the DIV instruction returns both the quotient
and the remainder simultaneously... even when dividing by a constant, which
gets optimised into a multiplication using some trickery with how MUL works
on x86 processors, if you try to compute the remainder right afterwards, it
will do the multiplication trick again, multiply the resultant quotient by
the divisor, and subtract the result from the original number).
Of course, lots of those already exist as individual test cases, but I
need something more extensive because a lot of optimisations, like those
that are designed to decrease the chance of pipeline stalls (I added one in
my optimiser overhaul, that turns "mov %reg1,%reg2; mov %reg2,%reg3" to
"mov %reg1,%reg2; mov %reg1,%reg3" - I was able to slip it in effectively
for free because another optimisation checks for the same arrangement, but
only if %reg2 is discarded afterwards, not if it's used again later), are
very hard to measure in a small test and need to be a part of an extensive
bench test before the benefits start to show.
Sometimes I get people asking why I'm bothering trying to find the
smallest of savings in size and execution speed - or in my own programming,
writing mathematical functions like the aforementioned matrix
multiplication in raw assembly language for the same benefit - since it's
so much time and effort for very little again. Truthfully... I enjoy the
challenge! And I'm driven further because I can pass on the benefits to
others.
I do a lot of playing around with mathematics, and when it comes to number
crunching, especially for things that can take weeks to complete (e.g.
Lucas-Lehmer Primality Testing), even a small saving can multiply into an
entire day of saved time. I grew up with Turbo Pascal and then Delphi 2.0
as a pre-teen, and being more of an algorithmic programmer nowadays, I want
to be able to say about FreePascal: "This is a good language for
time-critical functions". Just a little ambition!
Gareth aka. Kit
On Mon 25/02/19 18:41 , "Sven Barth" pascaldragon at googlemail.com sent:
J. Gareth Moreton schrieb am Mo., 25. Feb. 2019, 19:14:
The compiler isn't a valid case because the input source is different
(because of the very changes made to said compiler). It needs to be a
project that doesn't share anything with the compiler (except the run-time
libraries), so the source code is exactly the same so that when it is
built, it runs the same no matter which version of the compiler it was
built with.
I'm viewing it as a bit of a scientific experiment, where only a single
variable is changed, namely the compiler used. The compiled program
should produce exactly the same output and otherwise behave the same way,
so that any time metrics reflect only how long it takes to complete and
hence is reflective only of the quality of the machine code, not what the
program is doing... if that makes any sense.
You could always build an unmodified compiler with your modified one ;)
Regards, Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20190226/194538e3/attachment.html>
More information about the fpc-devel
mailing list