[fpc-pascal] Implementing AggPas with PtcGraph

Nikolay Nikolov nickysn at gmail.com
Thu Jun 22 01:46:50 CEST 2017

On 06/22/2017 02:42 AM, Nikolay Nikolov wrote:
> On 06/22/2017 01:21 AM, James Richters wrote:
>>> putimage can be accelerated, although it would still have to do a 
>>> memory copy.
>> Like this?
>> https://github.com/Zaaphod/ptcpas/compare/Zaaphod_Custom?expand=1#diff-fb31461e009ff29fda5c35c5115978b4 
>> This is amazingly faster.   I ran a test of just ptcgraph.putimage() 
>> in a loop, putting the same image over and over 1000 times and timing 
>> it.  The original ptcgraph.putimage() took 18.017 seconds.  After I 
>> applied this, the same loop took 1.056 seconds.  Quite an 
>> improvement!    It's still nowhere near as fast as just drawing stuff 
>> with ptcgraph directly, but for doing a memory copy of the entire 
>> screen, it's very fast
> Yes, that's a good start. That was exactly what I meant :)
>> I have an idea on how I could speed it up even further....
>> If I set up a second array with 1 bit per pixel, then (somehow) 
>> aggpas could set bits in this array to 1 whenever it changed a 
>> corresponding bit.  Now by analyzing the 'pixel changed' array one 
>> word at a time, (or maybe longword or qword at a time)  I could just 
>> skip over all the words that =0 and when I come across a word that <> 
>> 0   I could do a binary search of that word to only change the pixels 
>> that need to be changed.  If very little on the screen has changed, 
>> this would be quite a bit faster because the pixel changed array 
>> would be 1/16 the size of the full buffer.
>> The only way this would be of any benefit though is if aggpas set the 
>> bits in the 'pixel changed' array while it was changing the pixels of 
>> the buffer, because at that time it already has the array position 
>> and the fact that something changed available.  If I had to analyze 
>> the buffer separately and create the 'pixels changed' array, it would 
>> take too long.
> That sounds like a little bit of a special case - it'll work where 
> you're using putimage for a large area, that has very few pixels set. 
> Perhaps just reimplementing the general algorithm in inline asm, by 
> using SSE (or MMX) vector instructions would be the fastest, but maybe 
> it's not worth the pain and the pascal implementation is fast enough 
> for you. Just experiment and see what works best :)
Btw, I looked at your code again and saw a quick and cheap optimization 
- just move the case statement (case BitBlt of) outside the inner loop 
(for i:=X to X1 do), so the value of BitBlt is not checked once every 
pixel, but once per row.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20170622/f4c77e63/attachment.html>

More information about the fpc-pascal mailing list