[fpc-pascal] Implementing AggPas with PtcGraph
james at productionautomation.net
Thu Jun 22 13:59:13 CEST 2017
>That sounds like a little bit of a special case - it'll work where you're using putimage for a large area, that has very few pixels set.
That is exactly what I have almost all the time. I’m wanting to use putimage for the entire screen all the time, but very few pixels on the screen change at any given screen update. I have already tried only using putimage for part of the screen, and that helps sometimes, but most of the time it doesn’t help enough, because I’ll be drawing a long diagonal line, or a big ellipse and to encase the entity in a rectangular shape ends using a good portion of the screen, and by the time I get done calculating how to make the only slightly smaller area, I might as well just did putimage on the entire screen.
>Perhaps just reimplementing the general algorithm in inline asm, by using SSE (or MMX) vector instructions would be the fastest
That sounds completely over my head 😊
>but maybe it's not worth the pain
Maybe not…. I’m pretty sure I can handle processing the second array, but how and where to create it in aggpas, that I have no idea… yet… I have not actually tried to see how aggpas puts data in the buffer yet… It’s a huge package and I’m not sure what unit it’s even in.
>and the pascal implementation is fast enough for you.
It’s not quite fast enough yet…
>Just experiment and see what works best :)
Sounds like fun 😊 Maybe I’ll do a test and pre-build the second array myself, just to see if there is any real benefit to this whole idea and if there is then I’ll try to figure out how to do it with aggpas
>Btw, I looked at your code again and saw a quick and cheap optimization - just move the case statement (case BitBlt of) outside the inner loop (for i:=X to X1 do), so the value of BitBlt is not checked once every pixel, but once per row.
Great Idea, I took it one step further, wanting it to be as fast as possible and only check BitBlt once for the entire nested loop. I also made a combined procedure for both 8bpp and 16bpp. This is about 7% faster. You can see it here:
>Try rearranging that like this:
>Note that all array calculation and the case is removed from the inner most loop, at the expense of duplicating the for loop.
>The index is not used in the for loop and made 0 based to allow the tighest FOR loop code generation.
I also tried this, but for some strange reason it’s slower.. clocking in at 1.773s for my 1000x loop instead of 1.056s maybe I did something wrong. Here is what I did:
maybe the two inc(pdest); inc(psrc); inside the inner loop are slower than the inc(k)?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the fpc-pascal