[fpc-pascal]ansistrings

Mon Mar 1 11:27:01 CET 2004

Hey all,

I've been poking around the sources and documentation for some insight into the details of how ansistrings are implemented, and I am left with some questions.

It would be nice if, when comparing two ansistrings, fpc would first check to see if these two pointers are pointing to the same spot in memory, i.e. the same TAnsiRec. If they happen to be pointing to the same, a potentially long operation is reduced to a simple comparison of two memory addresses which probably only takes one processor cycle.

Looking at fpc_ansistr_compare in astrings.inc, and at cgadd.addstring (the only function that seems to call fpc_ansistr_compare), it appears not to do this. Perhaps I'm wrong? I don't _really_ understand what the code is doing. I believe the sources I'm looking at are 1.0.10.

If this quick comparison is in fact not implemented, I'd like to do it myself. (There are a number of places where I am checking long ansistrings for equality, and there is a reasonable chance that both pointers are pointing to the same address.)

( @s1[1] = @s2[1] ) seems to give the right result. is this the best way? or is it quicker/slower to use ( pointer(s1) = pointer(s2) )   (no doubt more elegant)

Now on to the second question... getting those ansistrings pointed to the same address! (Some of them already are, but I'd like to get more...)

I was kind of surprised to find that

  s1 := 'hello';
  s2 := 'hello';
  writeln ( pointer(s1) = pointer(s2) );     ...writes FALSE

Thus I assume that
  readln (s1);
  readln (s2);    ... would NEVER point them at the same address

Of course, checking every string against every other string would comprise an absurd performance hit in most cases. What I'd really like is to have a relatively small number of constant strings that could be compared against, and only when reading in a data file, or perhaps certain fields in a data file. (yeah, reading will take longer)... Then if my data file (and thus my filled pascal array) has 1,000 instances of "some_complex_but_often_identically_repeated_data_value", I get the following:
  - lots of memory savings
  - operator "=" gives a very fast TRUE result when they are pointing to the same

In fact, all the string comparison operators return a particular value if the two are equal, so they could all give a fast result in this case. This could happen both when comparing datum values against each other, and when comparing them to a constant string in my code.

Do resource strings offer this kind of intelligence, or are they designed for a completely different purpose, and offer no performance improvement for this special application?

Is there some special mode or compiler directive that does more string uniqueness checking, including at compile time (e.g. to find my two identical 'hello' assignments)?

Or shall I bite the bullet and decide whether I need to implement some of this stuff myself?

(( Typcasting my data into something that's not literal strings would of course be the best thing to do. However, I'm not quite ready for it yet, and would like to do what I can to help performance until I get to the point where I can put my data in pascal records. ))

Cheers!
-David