[fpc-pascal] Re: UnicodeString comparison performance
Thomas Schatzl
tom_at_work at gmx.at
Tue Jul 24 12:20:00 CEST 2012
HI,
On Mon, 2012-07-23 at 16:58 +0200, OBones wrote:
> Jonas Maebe wrote:
> > On 23 Jul 2012, at 10:58, OBones wrote:
> >
> >> leledumbo wrote:
> >>> I look at the generated code and in the direct one there's additional
> >>> overhead of decrementing the reference counter on each iteration.
> >> I see it too now (I forgot about the -a option).
> >> I can understand why there is a call to the decrementer outside the loop when using the variable, but I don't understand why it pushes it completely out of the loop. I mean, isn't there a reference count problem here?
> > Reference counted data types are returned by reference in a location passed to the function by the caller. The compiler here passes the address of S to the function, so that when assigning something to the function result inside the function, S' reference count gets decreased.
> >
> > The fact that when the result is returned in a temp, this temp also gets finalized on the caller side before the call is just a code generator inefficiency. I've changed that in trunk.
> Thanks for the explanation and the fix.
>
> Because the finalization happened too early, those memory allocations
> and deallocations were very costly and I found the direct code to be 30
> times slower.
> Doing it this way has the advantage of being inherently thread safe, but
> considering the performance penalty, I have moved to doing it this way:
>
> var
> Buffer: array [0..1024*1024-1] of WideChar;
> BufferLock: NativeInt;
>
> function GetSomeString(Index: Integer): UnicodeString;
> begin
> while BufferLock > 0 do
> Sleep(1);
>
> InterlockedIncrement(BufferLock);
> try
> CallToAnAPIThatWritesBackAWideString(@Buffer[0], Length(Buffer) - 1);
> Result := PWideChar(@Buffer[0]);
> finally
> InterlockedDecrement(BufferLock);
> end;
> end;
>
> This works, with an equivalent penalty on both methods and is threadsafe
> (I believe).
This code is not thread safe at all. A thread switch after the while
loop and before the increment will not prevent progress on other
threads, so multiple threads can enter the "critical section".
Use EnterCriticalSection/LeaveCriticalSection from the RTL if you need
that.
If you could somewhat decrease the size of your buffer to something
reasonable - do you really expect to translate a string with 1M chars? -
use the stack. Actually even a 2M data object on the stack might be
reasonable.
Further, in any case you'll need some code that works in "all" cases
anyway - what if your string does exceed 1M chars after all? I mean, on
a 1M string, heap allocation will probably be the least of your
performance worries.
This will completely avoid use of any synchronization primitive.
E.g.
function GetSomeString(index : integer) : UnicodeString;
const
bufsize = 1024; // as you wish
var
buffer : array[0..bufsize-1] of WideChar;
pbuffer : PWideChar;
islongstring : boolean;
lengthofstring : integer;
begin
lengthofstring := length(stringfromindex(integer));
islongstring := lengthofstring > bufsize;
if (islongstring) then
pbuffer := getmem(lengthofstring * sizeof(WideChar));
else
pbuffer := @buffer;
CallToAnAPIThatWritesBackAWideString(stringfromindex(integer),
pbuffer, lengthofstring);
if (islongstring) then
freemem(pbuffer);
result := pbuffer;
end;
You may want to improve the code further by e.g. checking whether the
memory allocation has been successful or not; or factor out the case
when the string is big into a separate method to improve readability.
Thomas
More information about the fpc-pascal
mailing list