<p>Am 19.05.2017 16:39 schrieb "Karoly Balogh (Charlie/SGR)" <<a href="mailto:charlie@scenergy.dfmk.hu">charlie@scenergy.dfmk.hu</a>>:<br>
><br>
> Hi,<br>
><br>
> On Fri, 19 May 2017, Reimar Grabowski wrote:<br>
><br>
> > Final: The render function takes about 90%, the cast-to-int about 5%. No<br>
> > other interesting functions shown. So the missing time must be spent<br>
> > doing floating point math and branching (ifs), as that's all the render<br>
> > function does.<br>
><br>
> Well, if I comment out the three additions where the ray is actually<br>
> traced and the tex := line, it's actually 60fps on my macbook. But<br>
> actually the real difference is made with the additions. If i comment out<br>
> everything, but those 3 (4 in fact) additions are in still there, it's<br>
> still slow.<br>
><br>
> Which made me thinking. I think you can vectorize that quite easily, and<br>
> use some packed SIMD instruction, maybe that will make a difference. C/C++<br>
> has some compiler intrinsics for that. I can't remember from the top of my<br>
> head if it's doable with FPC. Someone who feels like fiddling with this,<br>
> might want to try some assembly magic there, if it's possible somehow...</p>
<p>I think Jeppe wanted to add vector support. Though the question here is whether one wants to optimize/detect this at the AST level and convert that to implicit vectors or at the CSE level.</p>
<p>By the way: I think my commit today of a SSE Frac() implementation sped up the framerate by a third on Win64 compared to the one without it :D</p>
<p>Regards,<br>
Sven</p>