[fpc-pascal] Re: UnicodeString comparison performance

Tue Jul 24 12:20:00 CEST 2012

HI,

On Mon, 2012-07-23 at 16:58 +0200, OBones wrote:
> Jonas Maebe wrote:
> > On 23 Jul 2012, at 10:58, OBones wrote:
> >
> >> leledumbo wrote:
> >>> I look at the generated code and in the direct one there's additional
> >>> overhead of decrementing the reference counter on each iteration.
> >> I see it too now (I forgot about the -a option).
> >> I can understand why there is a call to the decrementer outside the loop when using the variable, but I don't understand why it pushes it completely out of the loop. I mean, isn't there a reference count problem here?
> > Reference counted data types are returned by reference in a location passed to the function by the caller. The compiler here passes the address of S to the function, so that when assigning something to the function result inside the function, S' reference count gets decreased.
> >
> > The fact that when the result is returned in a temp, this temp also gets finalized on the caller side before the call is just a code generator inefficiency. I've changed that in trunk.
> Thanks for the explanation and the fix.
>
> Because the finalization happened too early, those memory allocations 
> and deallocations were very costly and I found the direct code to be 30 
> times slower.
> Doing it this way has the advantage of being inherently thread safe, but 
> considering the performance penalty, I have moved to doing it this way:
> 
> var
>    Buffer: array [0..1024*1024-1] of WideChar;
>    BufferLock: NativeInt;
> 
> function GetSomeString(Index: Integer): UnicodeString;
> begin
>    while BufferLock > 0 do
>      Sleep(1);
> 
>    InterlockedIncrement(BufferLock);
>    try
>      CallToAnAPIThatWritesBackAWideString(@Buffer[0], Length(Buffer) - 1);
>      Result := PWideChar(@Buffer[0]);
>    finally
>      InterlockedDecrement(BufferLock);
>    end;
> end;
> 
> This works, with an equivalent penalty on both methods and is threadsafe 
> (I believe).

This code is not thread safe at all. A thread switch after the while
loop and before the increment will not prevent progress on other
threads, so multiple threads can enter the "critical section".

Use EnterCriticalSection/LeaveCriticalSection from the RTL if you need
that.

If you could somewhat decrease the size of your buffer to something
reasonable - do you really expect to translate a string with 1M chars? -
use the stack. Actually even a 2M data object on the stack might be
reasonable.

Further, in any case you'll need some code that works in "all" cases
anyway - what if your string does exceed 1M chars after all? I mean, on
a 1M string, heap allocation will probably be the least of your
performance worries.

This will completely avoid use of any synchronization primitive.

E.g.

function GetSomeString(index : integer) : UnicodeString;
const
  bufsize = 1024; // as you wish
var
  buffer : array[0..bufsize-1] of WideChar;
  pbuffer : PWideChar;
  islongstring : boolean;
  lengthofstring : integer;
begin
  lengthofstring := length(stringfromindex(integer));
  islongstring := lengthofstring > bufsize;
  if (islongstring) then
    pbuffer := getmem(lengthofstring * sizeof(WideChar));
  else
    pbuffer := @buffer;
  CallToAnAPIThatWritesBackAWideString(stringfromindex(integer),
pbuffer, lengthofstring);
  if (islongstring) then
    freemem(pbuffer);
  result := pbuffer;
end;

You may want to improve the code further by e.g. checking whether the
memory allocation has been successful or not; or factor out the case
when the string is big into a separate method to improve readability.

Thomas