[fpc-devel] Case block optimisation

Sat Dec 15 18:08:00 CET 2018

 Hi everyone,

 So here's something else I've been playing with in the interim... I've
been working on improving the algorithms used when compiling case blocks. 
It was driven partly by my binary search idea for certain kinds of case
block, which I wrote up over here:
http://wiki.freepascal.org/Case_Compiler_Optimization [1]

 So far, the binary search is a bit inconclusive on timing improvements, so
I need to work on it a bit more and also write some tests to show correct
operation and timings.

 Nevertheless, find attached a preliminary patch that improves other
aspects of code generation:

 - If a case block only contains one branch (not including the else block),
the initial range check is removed, since this becomes wasted effort.- If
the else block is empty, the else label is set to the end label - though
this doesn't increase the code size, it takes a bit of strain off the
peephole optimizer.
 - On -O2 and above, some node analysis is now done on the branch labels. 
Most of the time this just redirects it to the end label for empty blocks,
but if the block contains a goto statement, it will redirect it to its
destination instead, thus increasing performance by not having multiple
jumps (this won't get picked up by the peephole optimiser if the label
addresses are in a jump table).
 - Some checks now use what I call the 'true count' rather than the 'label
count'.  The true count includes each individual value in a range - for
example, 0..2 counts as 3.
 - For jump tables, if the case block almost covers the entire range (32
entries or fewer from full coverage), the initial range check is removed
and the gaps included in the jump table.

 To give an example as to where these improvements take effect, the
"taicpu.calcsize" method in "/x86/aasmcpu.pas" has a large case block that
analyses the encoding of assembler commands - before, this only became a
linear list because while there are loads of entries, the "labelcnt" value
is still under 64 because a number of values are combined into a range. 
This is now changed to an efficient jump table, and furthermore, it is
almost exhaustive because the input type is a Char, and only a handful of
values redirect to the else block, which raises an internal error.  As a
result, the jump table is changed to contain all 256 possible values (1KB
in size), with addresses to the else block where appropriate.

 The patch doesn't contain the binary search yet, but this algorithm covers
situations that are unsuitable for a jump table, namely the domain is large
but the number of valid blocks is very sparse (for example, if the input
type is a Word, a number of small values are checked, and then it checks
the value 10,000 as well, which sits well beyond any other values).  The
jump tree is still used for excessively-large case blocks.

 I won't consider it ready for patching just yet because I want to create
some more tests first.  Still, what do you think?

 Gareth aka. Kit
 (P.S. I hope the patch isn't too large that it gets flagged for
moderation)

Links:
------
[1] http://wiki.freepascal.org/Case_Compiler_Optimization
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20181215/e0b846f3/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: case-improvement.patch
Type: application/octet-stream
Size: 25250 bytes
Desc: not available
URL: <http://lists.freepascal.org/pipermail/fpc-devel/attachments/20181215/e0b846f3/attachment.obj>