[fpc-pascal] Re: UnicodeString comparison performance
OBones
obones at free.fr
Mon Jul 23 16:58:00 CEST 2012
Jonas Maebe wrote:
> On 23 Jul 2012, at 10:58, OBones wrote:
>
>> leledumbo wrote:
>>> I look at the generated code and in the direct one there's additional
>>> overhead of decrementing the reference counter on each iteration.
>> I see it too now (I forgot about the -a option).
>> I can understand why there is a call to the decrementer outside the loop when using the variable, but I don't understand why it pushes it completely out of the loop. I mean, isn't there a reference count problem here?
> Reference counted data types are returned by reference in a location passed to the function by the caller. The compiler here passes the address of S to the function, so that when assigning something to the function result inside the function, S' reference count gets decreased.
>
> The fact that when the result is returned in a temp, this temp also gets finalized on the caller side before the call is just a code generator inefficiency. I've changed that in trunk.
Thanks for the explanation and the fix.
In my example the difference is very small, but in my real code, it was
amplified by the fact that I was doing this:
SetLength(Result, 1024 * 1024);
CallToAnAPIThatWritesBackAWideString(@Result[1], Length(Result) - 1);
SetLength(Result, StrLen(PWideChar(Result)));
Because the finalization happened too early, those memory allocations
and deallocations were very costly and I found the direct code to be 30
times slower.
Doing it this way has the advantage of being inherently thread safe, but
considering the performance penalty, I have moved to doing it this way:
var
Buffer: array [0..1024*1024-1] of WideChar;
BufferLock: NativeInt;
function GetSomeString(Index: Integer): UnicodeString;
begin
while BufferLock > 0 do
Sleep(1);
InterlockedIncrement(BufferLock);
try
CallToAnAPIThatWritesBackAWideString(@Buffer[0], Length(Buffer) - 1);
Result := PWideChar(@Buffer[0]);
finally
InterlockedDecrement(BufferLock);
end;
end;
This works, with an equivalent penalty on both methods and is threadsafe
(I believe).
Should anyone have a better workaround, I'd be very pleased to hear
about it as I can't move to the trunk version of FPC just yet.
Many thanks for the help
Regards
More information about the fpc-pascal
mailing list