[fpc-devel] Re: dominant short strings in compiler source

Flávio Etrusco flavio.etrusco at gmail.com
Fri May 19 19:09:44 CEST 2006


On 5/19/06, Daniël Mantione <daniel.mantione at freepascal.org> wrote:
>
>
> Op Thu, 18 May 2006, schreef Flávio Etrusco:
>
> > > L> Dynamic arrays can be very handy and I never knew anyone who avoids
> > > L> them. Of course if your array has fixed length there's no reason
> > > L> to use a dynamic array either.
> > > L> Fortunately it's no very often that one falls in Borland's trap
> > > L> that dynamic arrays aren't copy-on-write like AnsiStrings... BTW,
> > > L> is this the behaviour in FPC, too?
>
> Free Pascal is Delphi compatible.

I know that FPC aims to be Delphi-compatible, but it's not always the
case, as e.g. the WideStrings were reference-counted until a couple of
months ago.
So you are saying that in this is specific case FPC is already
(unfortunately, as for the WideStrings case) compatible with Delphi?

> A lot of people use getmem combined with possibly reallocmem if the array
> size should change after initial allocation. It is low level programming
> and therefore ugly, but dynamic arrays are being considered ugly as well
> by many people because they differ a lot from standard Pascal semantics.
>
> > > L> It's simply because the code has to check there's only one reference
> > > L> to the string on each change. If you know there's no concurrent
> > > L> access to the string (e.g. you app is single-threaded, or you have a
> > > L> local copy of the string) you should access it as a PChar.
>
> This code:
>
> procedure z;
>
> var a:string;
>
> begin
>   a:='abc';
> end;
>
> ... generates this monster with $H+:
>
> P$TESTASTRING_Z:
>         push    ebp
>         mov ebp,esp
>         sub esp,52
>         mov dword [ebp-4],0
>         lea eax,[ebp-24]
>         mov ecx,eax
>         lea eax,[ebp-48]
>         mov edx,eax
>         mov eax,1
>         call    NEAR FPC_PUSHEXCEPTADDR
>         call    NEAR FPC_SETJMP
>         push    eax
>         test    eax,eax
>         jne NEAR .. at 5
>         lea edx,[ebp-4]
>         mov eax,edx
>         call    NEAR FPC_ANSISTR_DECR_REF
>         mov eax,dword [_$PROGRAM$_L12]
>         mov dword [ebp-4],eax
> .. at 5:
>         call    NEAR FPC_POPADDRSTACK
>         mov edx,dword INIT__SYSTEM_ANSISTRING
>         lea eax,[ebp-4]
>         call    NEAR fpc_finalize
>         pop eax
>         test    eax,eax
>         je  NEAR .. at 6
>         call    NEAR FPC_RERAISE
> .. at 6:
>         leave
>         ret
>
> With $H- the result is:
>
> P$TESTASTRING_Z:
>         push    ebp
>         mov ebp,esp
>         sub esp,256
>         lea ecx,[ebp-256]
>         mov edx,dword _$PROGRAM$_L9
>         mov eax,255
>         call    NEAR fpc_shortstr_to_shortstr
>         leave
>         ret
>
> It is therefore not surprising that the shortstring version is faster.
> Other reasons why they are faster are that temporary strings are allocated
> on the stack, a "sub esp,xxxx" is a lot faster than a getmem.
> Shortstrings also do not need reallocmem if they grow.
>
> > > L>
> > > L> > But they are said to be improved in recent versions (recent
> > > L> snapshots?).
> > > L>
> > > L> I find it strange that the cost of copying a ShortString (maybe
> > > L> because they are at most 255 bytes? Maybe cache locality usually
> > > L> is fine in this case? 8-|   ) is lower(better) than the
> > > L> locked-count-reference and the exception trapping...
>
> A shortstring copy is really fast. They are copied with 4 bytes at a time
> in assembler code, so you need at most 64 steps to copy a string of
> maximum length. Most strings are shorter, like the example above.
>
> However, you are right that copying is a limiting factor in shortstring
> performance.
>
> > > L> Anyway, isn't it just the case to correctly optimize string
> > > L> parameters as 'const' and 'var', and maybe using PChar in some few places, or
> > > L> can you think of any other reason for AnsiStrings to be slower than
> > > L> ShortStrings?
>
> A lot of them, see above.
>
> Daniël

Wow, thanks really a lot for all the info :-)
What is the disassembler you use? Is there any nice free one? I'll try
to digest that assembly since I'm not a "close friend" to it ;-)
Also, that case is IMHO the bad case of AnsiString (i.e. we have to
add a reference). I'm more interested if there's any overhead when
reading from a 'const' string parameter...

Cheers,
Flávio



More information about the fpc-devel mailing list