[fpc-devel] Performance of string handling in trunk

Jonas Maebe jonas.maebe at elis.ugent.be
Thu Jun 27 15:33:12 CEST 2013


On 27 Jun 2013, at 03:54, luiz americo pereira camara wrote:

> 2013/6/21 Sergei Gorelkin <sergei_gorelkin at mail.ru>:
>
>> I've profiled the code and found no conversions taking place. All the
>> slowdown appears to be caused by other reasons, hard to tell the  
>> topmost
>> contributor. What catches the eye is the large amount of calls to
>> UniqueString, and the fact that SetCodePage goes through implicit
>> try..finally block even if it does not need to convert the string.
>
> Seems that Florian changed SetCodePage to avoid implicit try finally.
>
> It improved the performance slightly but still a lot slower than  
> 2.6.X .


The speed is virtually identical for me between FPC 2.6.x and trunk on  
Mac OS X/PPC. Of course, it's an incomplete test program and there is  
no information about how it is compiled. Additionally, the timings in  
the last post in that thread are completely different from the ones in  
the first post, so they seem to come from a different test program or  
a different system (which should be specified, comparing arbitrary  
numbers is useless if you are interested in finding out what the cause  
is).

There was a small difference if the program was either compiled with - 
Fcxxx or contains a {$codepage xxx} directive. That results in the  
constant strings in the program to get that particular code page  
rather than CP_ACP in 2.7.1. In this case the test program became 10%  
slower. The reason was that new ansistrings created in the rtl (e.g.  
for char to ansistring, or concatenating ansistrings) get CP_ACP by  
default, and changing this afterwards to the custom code page caused  
going through InternalSetCodePage() with its exception frame setup/ 
tear down. If solved that in r24985.

Under Linux/i386 I however do still see a significant slowdown  
(regardless of the used code page). Strangely enough, it goes away for  
me if the system unit is compiled with -O2 -Oonostackframe instead of  
with -O2. I don't know why. It might be some ugly cache conflict, but  
for such a small program and little data that is unlikely to be the  
case on modern x86 caches. It might also be code alignment, but some  
playing with -Oaloop and -Oaproc doesn't seem to change it either.


Jonas



More information about the fpc-devel mailing list