[fpc-devel] Re: dominant short strings in compiler source

Daniël Mantione daniel.mantione at freepascal.org
Fri May 19 09:29:03 CEST 2006



Op Thu, 18 May 2006, schreef Flávio Etrusco:

> > L> Dynamic arrays can be very handy and I never knew anyone who avoids
> > L> them. Of course if your array has fixed length there's no reason
> > L> to use a dynamic array either.
> > L> Fortunately it's no very often that one falls in Borland's trap
> > L> that dynamic arrays aren't copy-on-write like AnsiStrings... BTW, 
> > L> is this the behaviour in FPC, too?

Free Pascal is Delphi compatible.

A lot of people use getmem combined with possibly reallocmem if the array 
size should change after initial allocation. It is low level programming 
and therefore ugly, but dynamic arrays are being considered ugly as well 
by many people because they differ a lot from standard Pascal semantics.

> > L> It's simply because the code has to check there's only one reference
> > L> to the string on each change. If you know there's no concurrent
> > L> access to the string (e.g. you app is single-threaded, or you have a
> > L> local copy of the string) you should access it as a PChar.

This code:

procedure z;

var a:string;

begin
  a:='abc';
end;

... generates this monster with $H+:

P$TESTASTRING_Z:
        push    ebp
        mov ebp,esp
        sub esp,52
        mov dword [ebp-4],0
        lea eax,[ebp-24]
        mov ecx,eax
        lea eax,[ebp-48]
        mov edx,eax
        mov eax,1
        call    NEAR FPC_PUSHEXCEPTADDR
        call    NEAR FPC_SETJMP
        push    eax
        test    eax,eax
        jne NEAR .. at 5
        lea edx,[ebp-4]
        mov eax,edx
        call    NEAR FPC_ANSISTR_DECR_REF
        mov eax,dword [_$PROGRAM$_L12]
        mov dword [ebp-4],eax
.. at 5:
        call    NEAR FPC_POPADDRSTACK
        mov edx,dword INIT__SYSTEM_ANSISTRING
        lea eax,[ebp-4]
        call    NEAR fpc_finalize
        pop eax
        test    eax,eax
        je  NEAR .. at 6
        call    NEAR FPC_RERAISE
.. at 6:
        leave
        ret

With $H- the result is:

P$TESTASTRING_Z:
        push    ebp
        mov ebp,esp
        sub esp,256
        lea ecx,[ebp-256]
        mov edx,dword _$PROGRAM$_L9
        mov eax,255
        call    NEAR fpc_shortstr_to_shortstr
        leave
        ret

It is therefore not surprising that the shortstring version is faster. 
Other reasons why they are faster are that temporary strings are allocated 
on the stack, a "sub esp,xxxx" is a lot faster than a getmem.
Shortstrings also do not need reallocmem if they grow.

> > L> 
> > L> > But they are said to be improved in recent versions (recent
> > L> snapshots?).
> > L> 
> > L> I find it strange that the cost of copying a ShortString (maybe
> > L> because they are at most 255 bytes? Maybe cache locality usually
> > L> is fine in this case? 8-|   ) is lower(better) than the
> > L> locked-count-reference and the exception trapping...

A shortstring copy is really fast. They are copied with 4 bytes at a time 
in assembler code, so you need at most 64 steps to copy a string of 
maximum length. Most strings are shorter, like the example above.

However, you are right that copying is a limiting factor in shortstring
performance. 

> > L> Anyway, isn't it just the case to correctly optimize string
> > L> parameters as 'const' and 'var', and maybe using PChar in some few places, or
> > L> can you think of any other reason for AnsiStrings to be slower than
> > L> ShortStrings?

A lot of them, see above. 

Daniël


More information about the fpc-devel mailing list