<HTML>
<div>I figured this is as good a place as any to talk about my attempt at improving the speed of the stage 1 optimizer in issue #33855. On paper, a binary search is the fastest method to locate a data pointer associated with a key (in this case, an opcode). Of course, in practice that isn't always the case due to overhead and the penalties (or lack of) associated with conditional jumps.</div><div><br>
</div><div>The main problem with my binary search implementation is the extra complexity it puts into the code, and as Markov mentioned, the fact it's limited to just x86 rather than being more general-purpose. The ideal solution is to tie the binary search and the associated key/data map into the compiler, specifically the case node generation, but there are some theoretical issues that need to be resolved:</div><div><br>
</div><div>- What is the minimum number of case nodes where a binary search should be used over a simpler linear search? The maximum number of iterations of a binary search loop is equal to ceil(log2(n)), where n is the size of the list. Further optimisations with the likes of CMOVcc and loop unrolling can eliminate jumps entirely, but there is still significant overhead where it might be faster to simply perform a linear search, especially where the most frequent entry is at or near the top of the list.</div><div><br>
</div><div>- How can it be detected that each case node has an identical setup? With the optimizer, the current opcode is passed through a case block, and if a match is found, it is passed through an object method of the format "function(var p: tai): Boolean". The fact that the function calls are identical means that some optimisation can be performed with common sub-expressions, specifically passing the address of p into RDX or RSI, depending on if the OS is Windows or Linux on x86-64.</div><div><br>
</div><div>- What happens if there is an imperfection in the source code? A vague question, I know, but in the case, some of the functions are of the format "function(const p: tai): Boolean", which require a slightly different set-up with the parameters. This is not an error, and the functions are sometimes set up this way if p never changes (which is the case if this particular instruction is never deleted by the optimizer), but it breaks the homogenuity of the case nodes and hence would make a highly-optimized search much harder. Should the programmer be warned or given a hint in how to improve their code, or is it up to us to be smart enough to spot "const p: tai" instead of "var p: tai"?<br>
<br>
- How easy would it be to make such a binary search more general-purpose? My particular code simply uses a map that pairs an opcode to a pointer in a many-to-one relationship. In this case, the algorithm doesn't care what the pointer is - it could easily be a label address, an object or a generic block of memory on the heap instead of a procedural address. Granted, as mentioned, the situation where each case node is a function call with an identical setup is a special case because jumps can be eliminated entirely.</div><div><br>
</div><div>I think that's everything anyway. I admit my motivations have slightly negative roots because I remember when the main advertising point of Turbo Pascal 7.0 was the speed of its compiler, and I'm harking back to those days. Also the size of the binaries produced when compared to, say, Microsoft Visual C++. Of course, size isn't as big of a concern as it was back in the days when you were limited to a 1.44 MB floppy disk, but if there's a space where I can eliminate or restructure inefficient or superfluous code, I'll take it!</div><div><br>
</div><div>If I can be filled in, how do case nodes work currently? I know there's a kind of linear search where the ordinal variable has a number subtracted that corresponds to the next case node, and something called a "jump table", but I'm not entirely sure how that works yet.</div><div><br>
</div><div>Gareth aka. Kit<br>
</div></HTML>