[fpc-pascal] FPImage and GetDataLineStart

Thu Apr 21 20:55:12 CEST 2011

In our previous episode, Mattias Gaertner said:
>  > > GetScanLine/SetScanLine,
>  > > and all other FPIMage operations will work equally well.
>  >
>  > This will not accelerate reading/writing at all. When writing (or any other
>  > form of streaming), there will still be a procedure call per pixel and
>  > conversion to/from 64-bit.?
> Yes, it does, because of cache effects.

Throw away an euro, gain 2 cents.

>  > class, and make some of the aspects generic.? (e.g.? for a yuv422 image you
>  > need to define a record with several inline static methods and then
>  > parameterize
>  > some generic yuv class that is a descendant of a generic 8-bit storage
>  > class etc) with that record. (the generic yuv class then knows what methods
>  > to call, and are somewhat efficient due to the inlining)?
> Can you give an example, how this should work?
> For example, how would you alter the following routine:
> ?
> procedure PaintSomething(img: TFPCustomImage);
> var x,y: integer;?
> begin
> ? for y:=10 to img.Height-10 do ?
> ??? for x:=10 to img.Width-10 do ?
> ????? img.Colors[x,y]:=colRed;?
>  end; ?

It is pixel level access, so it won't be optimized. Which is exactly the
problem, since currently fcl-image doesn't provide anything but pixel level
access.

If I went theoretical, the only advantage is that img could be a generic
derivate that implements the exact yuv422 storage class which would have the
rgb to yuv conversion inlined for the exact yuv and rgb types. So you would
keep the virtual call, but the code in it would be generated for the
specific tcolor32 (Tcolor64?) to yuv422 conversion.

HOWEVER, if your procedure actually named the specific type, it would be a
lot faster. So if you only generate say RGBA 32-bit images, the above code
would _NOT_ call any procedure, since the whole shebang would be
inlinable, and in ideal cases could get pretty close to ideal due to cse or
hoisting of invariants.

STILL that is not the objective. The main objective is to allow fast line
based access using the native types for the "easy" types.

1) providing several speedy base storage sizes (8,16,32,64 bit maybe 24 too) without duplicating
code.

2) provide row access for these basic types. Yes, I know. It doesn't exactly
provide for 12-bit packed access, which means such types would still follow
the slow "generic" way by mapping those to 64bpp.

>  > The fun part is that you can still implement the current way as baseline
>  > this way, and then progressively implement quicker variants.
>  Note: TLazIntfImage already contains a lot of access functions for many common
> memory formats.

I know. But see above generics class. It is 500 lines, and provides the
pretty much same without case statements, virtual procedures, nothing. 

If you declare your class as e.g. TBW32image (RGBA), a pixel assignment is
an inline handful of instructions.

I think fcl-image with its no-compromise format support was great initial
effort.  But it is time to at least allow some internal shortcuts for
performance's sake. And generics allow a way to put this into a class
hierarchy without too much code duplication.