[fpc-devel] Re: dominant short strings in compiler source

Daniël Mantione daniel.mantione at freepascal.org
Thu May 18 22:38:45 CEST 2006



Op Thu, 18 May 2006, schreef L505:

> That's one solution, that's not the only solution.

Very right. It is a trade-off. Do you fix the shortstring issue 
and continue to get their benefits, or do you abandon them, 
rewrite large parts of the compiler and pay the performance/memory 
usage price?

> I can see people arguing that a 50 element limited short string is enough, I seriously
> can.
> 
> I think you guys may be living in a 255 cave, simply because that's all we have to deal
> with at this time. Some say that ansistrings might be the way to go using sysutils -
> personally I think sysutils has no place in the compiler core and the compiler core should
> have tight custom units with no end user units like sysutils.

I don't agree here. The compiler is no special application regarding 
interfacing to the operating system. We should eat our own dogfood in 
other to provide the users the best runtime library. I agree there are 
technical issues against sysutils, which might warrant a new unit. On the 
other hand, you have to consider the cost/benefit. Sysutils is pretty 
usefull, and despite the issues, it is possible write very good software 
with it. Separate units increase the amount of work to port the compiler.

> One way to accomplish this,
> like I've already mentioned, is to use shortstring/longstring/array of string/ based Dos
> unit, using shortstrings where necessary, arrays of strings where necessary, and arrays of
> chars or longstrings where necessary. An array of char is just a dumb longstring, that's
> all. Upgraded Dos unit could contain some functions pulled in from sysutils, but not
> actual sysutils in the uses clause - just some optimized systutils pulled in and put into
> the upgraded dos unit. Still keeping the old Dos unit for compatibility for users, name
> the new upgraded dos unit anything - newdos.pp, whatever.
> 
> I'd be willing to help on this one and do some work, but unfortunately since we're all
> disagreeing it means we can't do any work until we come to an agreement. Once again, it's
> not just about having a team of programmers doing the grunt work - but also about having
> some sort of consensus or agreement before doing the work. Otherwise one of us will waste
> our time submitting a patch which won't be committed because some other folks don't like
> the way it was done.

Remember there are more issues with Dos than just the string length. For 
example findfirst/findnext is not really modern either. A modern unit 
would be a design from scratch. Actually, I think it would be nice to have 
one.

> If you use an array of strings you eventually have to combine these array of strings
> together into one common buffer to send to exec(), so you are reinventing the longstring
> or the ansistrings, or an array of char if it is one big piece being sent in the end
> anyway. The longstring is faster.

Not entirely true. On Dos-like operating systems the combining happens 
indeed in a buffer that is allocated using getmem. On Unix like systems, 
the paramaters are currently combined to pass them to dos.exec, then split 
again to pass them to execve, which is a strange situation.

> It's perfectly okay that you don't want to implement
> longstring because it is hard work - but at least admit that it is useful, whether it is
> implemented or not. It's like saying ansistrings are useless garbage because we haven't
> implemented them yet. No, they are useful - but maybe they are hard to implement. I doubt
> a longstring is hard to implement compared to something like templates/generics, though.
> 
> But don't take my message as an offense - I'm sure you all know it is normal for this to
> happen among programmers - discussing topics and arguing their brains out.

Don't worry :) 

> Can someone tell me how slow/fast a dynamic array is compared to a fixed one? Say you used
> a dynamic array of chars or dynamic array of shortstrings - would the dynamic array be
> slow on a general basis? Maybe we will have to resort to benchmarks using the cpu timer.
> And then there is also a fixed array of shortstrings or a fixed array of chars too.

A dynamic array is like an ansistring, they are garbage collected and 
therefore your get an implicit try/finally. Generally, we use reallocmem 
inside the compiler, again for performance.

Daniël


More information about the fpc-devel mailing list