[fpc-devel] Re: dominant short strings in compiler source
Daniël Mantione
daniel.mantione at freepascal.org
Fri May 19 09:29:03 CEST 2006
Op Thu, 18 May 2006, schreef Flávio Etrusco:
> > L> Dynamic arrays can be very handy and I never knew anyone who avoids
> > L> them. Of course if your array has fixed length there's no reason
> > L> to use a dynamic array either.
> > L> Fortunately it's no very often that one falls in Borland's trap
> > L> that dynamic arrays aren't copy-on-write like AnsiStrings... BTW,
> > L> is this the behaviour in FPC, too?
Free Pascal is Delphi compatible.
A lot of people use getmem combined with possibly reallocmem if the array
size should change after initial allocation. It is low level programming
and therefore ugly, but dynamic arrays are being considered ugly as well
by many people because they differ a lot from standard Pascal semantics.
> > L> It's simply because the code has to check there's only one reference
> > L> to the string on each change. If you know there's no concurrent
> > L> access to the string (e.g. you app is single-threaded, or you have a
> > L> local copy of the string) you should access it as a PChar.
This code:
procedure z;
var a:string;
begin
a:='abc';
end;
... generates this monster with $H+:
P$TESTASTRING_Z:
push ebp
mov ebp,esp
sub esp,52
mov dword [ebp-4],0
lea eax,[ebp-24]
mov ecx,eax
lea eax,[ebp-48]
mov edx,eax
mov eax,1
call NEAR FPC_PUSHEXCEPTADDR
call NEAR FPC_SETJMP
push eax
test eax,eax
jne NEAR .. at 5
lea edx,[ebp-4]
mov eax,edx
call NEAR FPC_ANSISTR_DECR_REF
mov eax,dword [_$PROGRAM$_L12]
mov dword [ebp-4],eax
.. at 5:
call NEAR FPC_POPADDRSTACK
mov edx,dword INIT__SYSTEM_ANSISTRING
lea eax,[ebp-4]
call NEAR fpc_finalize
pop eax
test eax,eax
je NEAR .. at 6
call NEAR FPC_RERAISE
.. at 6:
leave
ret
With $H- the result is:
P$TESTASTRING_Z:
push ebp
mov ebp,esp
sub esp,256
lea ecx,[ebp-256]
mov edx,dword _$PROGRAM$_L9
mov eax,255
call NEAR fpc_shortstr_to_shortstr
leave
ret
It is therefore not surprising that the shortstring version is faster.
Other reasons why they are faster are that temporary strings are allocated
on the stack, a "sub esp,xxxx" is a lot faster than a getmem.
Shortstrings also do not need reallocmem if they grow.
> > L>
> > L> > But they are said to be improved in recent versions (recent
> > L> snapshots?).
> > L>
> > L> I find it strange that the cost of copying a ShortString (maybe
> > L> because they are at most 255 bytes? Maybe cache locality usually
> > L> is fine in this case? 8-| ) is lower(better) than the
> > L> locked-count-reference and the exception trapping...
A shortstring copy is really fast. They are copied with 4 bytes at a time
in assembler code, so you need at most 64 steps to copy a string of
maximum length. Most strings are shorter, like the example above.
However, you are right that copying is a limiting factor in shortstring
performance.
> > L> Anyway, isn't it just the case to correctly optimize string
> > L> parameters as 'const' and 'var', and maybe using PChar in some few places, or
> > L> can you think of any other reason for AnsiStrings to be slower than
> > L> ShortStrings?
A lot of them, see above.
Daniël
More information about the fpc-devel
mailing list