<HTML>
<style> BODY { font-family:Arial, Helvetica, sans-serif;font-size:12px; }</style>Hi everyone,<br>
<br>
So here's something else I've been playing with in the interim... I've been working on improving the algorithms used when compiling case blocks. It was driven partly by my binary search idea for certain kinds of case block, which I wrote up over here: <a href="http://wiki.freepascal.org/Case_Compiler_Optimization">http://wiki.freepascal.org/Case_Compiler_Optimization</a><br>
<br>
So far, the binary search is a bit inconclusive on timing improvements, so I need to work on it a bit more and also write some tests to show correct operation and timings.<br>
<br>
Nevertheless, find attached a preliminary patch that improves other aspects of code generation:<br>
<br>
<div>- If a case block only contains one branch (not including the else block), the initial range check is removed, since this becomes wasted effort.</div><div>- If the else block is empty, the else label is set to the end label - though this doesn't increase the code size, it takes a bit of strain off the peephole optimizer.<br>
- On -O2 and above, some node analysis is now done on the branch labels. Most of the time this just redirects it to the end label for empty blocks, but if the block contains a goto statement, it will redirect it to its destination instead, thus increasing performance by not having multiple jumps (this won't get picked up by the peephole optimiser if the label addresses are in a jump table).<br>
</div>- Some checks now use what I call the 'true count' rather than the 'label count'. The true count includes each individual value in a range - for example, 0..2 counts as 3.<br>
- For jump tables, if the case block almost covers the entire range (32
entries or fewer from full coverage), the initial range check is removed
and the gaps included in the jump table.<br>
<br>
To give an example as to where these improvements take effect, the "taicpu.calcsize" method in "/x86/aasmcpu.pas" has a large case block that analyses the encoding of assembler commands - before, this only became a linear list because while there are loads of entries, the "labelcnt" value is still under 64 because a number of values are combined into a range. This is now changed to an efficient jump table, and furthermore, it is almost exhaustive because the input type is a Char, and only a handful of values redirect to the else block, which raises an internal error. As a result, the jump table is changed to contain all 256 possible values (1KB in size), with addresses to the else block where appropriate.<br>
<br>
The patch doesn't contain the binary search yet, but this algorithm covers situations that are unsuitable for a jump table, namely the domain is large but the number of valid blocks is very sparse (for example, if the input type is a Word, a number of small values are checked, and then it checks the value 10,000 as well, which sits well beyond any other values). The jump tree is still used for excessively-large case blocks.<br>
<br>
I won't consider it ready for patching just yet because I want to create some more tests first. Still, what do you think?<br>
<br>
<div>Gareth aka. Kit</div><div><br>
</div><div>(P.S. I hope the patch isn't too large that it gets flagged for moderation)<br>
</div> </HTML>