[fpc-pascal] FPImage and GetDataLineStart

Thu Apr 21 16:00:15 CEST 2011

In our previous episode, Leonardo M. Ram? said:
>> IIRC I accelerated loading/saving simple 8-bit BMP images
>> 20 to 50 times in
>> my work code.
>> 

> Do you care to share some insights about what you did to accelerate it?

- Made it 8bpp only
- introduced linebased writing.
- since my images are only 32-bit aligned, just blockwrite their pixeldata
  in one run.
- I've no backwards compatibility limitations. IOW the images are not GDI
 compatible, since drawing goes via opengl.

For me this was all important, since writing bmps was the only non zero copy
part in my image processing trajectory.

I've a newer set up based on generics that uses one template for 8,16,32
bits images. It is not finished though, and uses Delphi style generics.

The code is written for Delphi, but when I need it I port it to FPC too. The
generic one hasn't been ported to FPC.

Since all these are based on fcl-image code, they are opensourcable, but
they need to be extracted and cleaned up. But I put a snapshot here:

http://www.stack.nl/~marcov/bimagerelease.zip

Notes:
  - baseimage is the 8bpp class that is in production. Bmp writing is
    integrated.  pnghandler is a simple base class to read/write pngs.
  - baseimagegen is the same, but redone generic. I use it mostly for
    experiments, and it is NOT production use, and in Delphi generics
    dialect. (which FPC doesn't support yet)
  - pbbyte is a pointer to a one byte value that can be overindexed ( like x[2]).  
    pbyte on Delphi 2009+ and FPC, and pchar on older Delphi's

The baseimagegen was mostly proof of concept. It derives a generic class
from a non generic baseclass. This allows to use the generic derived classes
in performance specific code, and the general base class for cases where it
matter less. 

The generic code has some nested types to allow processing code somewhat
independant of pixel size in a way that it is still optimal.

Typical code would look like this: 

procedure something (img :TBW8Image);  // 8 bit grayscale 

var ppixel,ppixelend : img.reft; // pointer type always the same as the parameter expects
    x, y : integer;
    pixvalue : img.baseunit;  // whatever our pixel type is.
begin

  for y:=0 to img.imageheight-1 do
   begin
     ppixel:=img.getimagepointer(0,y);   // inlinable in theory, 4-6 instructions
     ppixelend:=ppixel[img.imagewidth];
     while (ppixel<ppixelend) do  // 1 moving var in this loop.
       begin
         pixvalue:=ppixel^;
         // operate on pixvalue or use ppixel^ directly.
         inc(ppixel);
       end;
   end;

Note that this walking of the image is independant if the image is stored
topdown or not. And the general skeleton of the code (should) remain(s) working if I
change the parameter from 8-bit to 16-bit.

In reality, the Delphi optimizer doesn't do a absolutely great job when
inlining such generalized inline methods. But even halfway is already quite
optimal.