[fpc-devel] Re: dominant short strings in compiler source

L505 fpc505 at z505.com
Thu May 18 21:50:54 CEST 2006


> > Also assembler symbols/labels should get extended to strings > 255 in the
> > future because there is already a bug open in which it is demonstrated that it
> > is possible to create too long labels which makes a program uncompilable.
> > Or some scheme derived which makes sure that labels never get larger than 255
> > chars.
> > (Typed constant within 2 or 3 nested functions)

> This is exactly a thing that will not happen. Longer assembler labels
> have no usefull advantage for the end user, they are only used
> hidden from the user. Long names only cost memory. For mangled names, the compiler
> already switches to hashes instead of type lists. If there is any other
> reason a symbol gets to long, it can be handled in a similar way.

> > Additionally even the ppc64 compiler isn't able to cycle when compiled with
> > -Cg because of the shortstring limitation, a few symbols get truncated, which
> > makes the assembler fail.
> > This is because the assembler syntax for declaring a symbol in the GOT on this
> > platform requires the compiler to add the symbol name twice to the directive.
> > This effectively limits symbol names to around 120 chars already. Which is, as
> > mentioned before, already not the case for the compiler as it is.

> > The solution is to generate shorter symbols. Hash the symbol name if it
> > gets too long.

That's one solution, that's not the only solution.

I can see people arguing that a 50 element limited short string is enough, I seriously
can.

I think you guys may be living in a 255 cave, simply because that's all we have to deal
with at this time. Some say that ansistrings might be the way to go using sysutils -
personally I think sysutils has no place in the compiler core and the compiler core should
have tight custom units with no end user units like sysutils. One way to accomplish this,
like I've already mentioned, is to use shortstring/longstring/array of string/ based Dos
unit, using shortstrings where necessary, arrays of strings where necessary, and arrays of
chars or longstrings where necessary. An array of char is just a dumb longstring, that's
all. Upgraded Dos unit could contain some functions pulled in from sysutils, but not
actual sysutils in the uses clause - just some optimized systutils pulled in and put into
the upgraded dos unit. Still keeping the old Dos unit for compatibility for users, name
the new upgraded dos unit anything - newdos.pp, whatever.

I'd be willing to help on this one and do some work, but unfortunately since we're all
disagreeing it means we can't do any work until we come to an agreement. Once again, it's
not just about having a team of programmers doing the grunt work - but also about having
some sort of consensus or agreement before doing the work. Otherwise one of us will waste
our time submitting a patch which won't be committed because some other folks don't like
the way it was done.

If you use an array of strings you eventually have to combine these array of strings
together into one common buffer to send to exec(), so you are reinventing the longstring
or the ansistrings, or an array of char if it is one big piece being sent in the end
anyway. The longstring is faster. It's perfectly okay that you don't want to implement
longstring because it is hard work - but at least admit that it is useful, whether it is
implemented or not. It's like saying ansistrings are useless garbage because we haven't
implemented them yet. No, they are useful - but maybe they are hard to implement. I doubt
a longstring is hard to implement compared to something like templates/generics, though.

But don't take my message as an offense - I'm sure you all know it is normal for this to
happen among programmers - discussing topics and arguing their brains out.

Can someone tell me how slow/fast a dynamic array is compared to a fixed one? Say you used
a dynamic array of chars or dynamic array of shortstrings - would the dynamic array be
slow on a general basis? Maybe we will have to resort to benchmarks using the cpu timer.
And then there is also a fixed array of shortstrings or a fixed array of chars too.




More information about the fpc-devel mailing list