[fpc-pascal]ansistrings

Mon Mar 1 12:00:13 CET 2004

On Mon, 1 Mar 2004, David Emerson wrote:

> Hey all,
>
> I've been poking around the sources and documentation for some insight into the details of how ansistrings are implemented, and I am left with some questions.
>
>
> It would be nice if, when comparing two ansistrings, fpc would first check to see if these two pointers are pointing to the same spot in memory, i.e. the same TAnsiRec. If they happen to be pointing to the same, a potentially long operation is reduced to a simple comparison of two memory addresses which probably only takes one processor cycle.
>
> Looking at fpc_ansistr_compare in astrings.inc, and at cgadd.addstring (the only function that seems to call fpc_ansistr_compare), it appears not to do this. Perhaps I'm wrong? I don't _really_ understand what the code is doing. I believe the sources I'm looking at are 1.0.10.

This is already implemented in version 1.9.2.

>
> If this quick comparison is in fact not implemented, I'd like to do it myself. (There are a number of places where I am checking long ansistrings for equality, and there is a reasonable chance that both pointers are pointing to the same address.)
>
> ( @s1[1] = @s2[1] ) seems to give the right result. is this the best way?
> or is it quicker/slower to use ( pointer(s1) = pointer(s2) )
> (no doubt more elegant)

The code uses ( pointer(s1) = pointer(s2) )

>
>
>
> Now on to the second question... getting those ansistrings pointed to the same address!
> (Some of them already are, but I'd like to get more...)
>
> I was kind of surprised to find that
>
>   s1 := 'hello';
>   s2 := 'hello';
>   writeln ( pointer(s1) = pointer(s2) );     ...writes FALSE

This is normal. But if you do it 'correct', i.e:

Const
  MyHello = 'Hello';

S1:= MyHello;
S2:= MyHello;

They will point at the same address.

>
> Thus I assume that
>   readln (s1);
>   readln (s2);    ... would NEVER point them at the same address
>
> Of course, checking every string against every other string would comprise an absurd performance hit in most cases. What I'd really like is to have a relatively small number of constant strings that could be compared against, and only when reading in a data file, or perhaps certain fields in a data file. (yeah, reading will take longer)... Then if my data file (and thus my filled pascal array) has 1,000 instances of "some_complex_but_often_identically_repeated_data_value", I get the following:
>   - lots of memory savings
>   - operator "=" gives a very fast TRUE result when they are pointing to the same
>
> In fact, all the string comparison operators return a particular value if the two are equal, so they could all give a fast result in this case. This could happen both when comparing datum values against each other, and when comparing them to a constant string in my code.
>
> Do resource strings offer this kind of intelligence, or are they designed for a completely different purpose, and offer no performance improvement for this special application?

They are normal ansistrings, just stored in a special table.

>
> Is there some special mode or compiler directive that does more string uniqueness checking, including at compile time (e.g. to find my two identical 'hello' assignments)?

No. Please use constants, that's what they're for.

Michael.