[fpc-devel] Optimizations

Fri Jan 24 00:26:35 CET 2014

24.01.2014 3:04, Martin Frb пишет:
> On 23/01/2014 22:26, August Oktobar wrote:
>> Hello, I have seen your mails about peephole optimization, so I wonder if you could look at this
>> reports
>> http://bugs.freepascal.org/view.php?id=23595
>>
>> or perhaps optimize slow array access using operator [] (it is faster to use pointer arithmetics)
>>
>> thanks!
>
> 1) I am just getting started on this, so I can only answer with limited knowledge.
>      @experts, please correct me below, where needed
> 2) The peephole opt is only something I do "on the side" as it currently is.
>
>  From a quick glance, this does not look like something for the peephole opt.
>
> The peephole opt currently looks at statements that are close together (follow each other immediately).
> I am not sure to which extend (if at all) it would be acceptable to break that limit (it is doable,
> question is if desired).
>
> In this specific case:
> 1) between the "fstpl" and the "mov (half_the_data), %edi" are other statements.
>    detecting the connection would either:
>    - need scanning several statements ahead. This would be slow, because it had to be done after
> each storing "something to memory" (so very often)
>    - keeping state of all the involved registers and memory (do-able / interesting at least from a
> theoretical view / but not sure if desired)
> 2) only half the mem is accessed, and then the other half. That means to detect the connection
> between the mem read and the register, it is needed to analyse 4 statements. Very unlikely to see
> this in the peephole opt.
>
> The livelihood of "a[i]" needs to be checked where the code is generated.
>

1) You are right that it's not the job for peephole analyzer, it is typical common subexpression 
elimination.
2) Reporter's assumption about fstp is wrong: the first fstp instruction removes value from fpu 
stack, so it cannot be used for the second time without first reloading value onto stack.
3) The assignments of floating-point values are currently being generated using integer 
instructions, hence the subsequent code. This way it doesn't depend on number of available FPU 
registers, which is hard to know at any point.

Regards,
Sergei