[fpc-pascal] Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()
Michael Van Canneyt
michael at freepascal.org
Sat Feb 3 20:13:37 CET 2007
On Sat, 3 Feb 2007, Marco van de Voort wrote:
> > On Sat, 3 Feb 2007, Marco van de Voort wrote:
> >
> > > > Yes it will, because the reallocations don't happen as often.
> > > > The sorting introduces an overhead anyway, whether you set capacity or not.
> > >
> > > Yes, but I was talking about slowness in general, not just from the heap.
> > >
> > > And TStringList with those huge internal list has to move on avg half of the
> > > array. If TStringList had an extra indirection (array of ptr to blk of ptrs)
> > > it would be less so.
> >
> > Eh ? What happens if you do a Insert(0,'SomeString') ? Don't you need to move
> > across all blocks ? Or will you just grow the first block ?
>
> Assume 1M elements.
>
>
> Then the tstringlist case is 1M*sizeof(pointer) bytes. Inserting a new node on avg moves
> (1M*sizeof(pointer))/2 bytes. Overhead of index is only heapmgr overhead for the big
> block.
>
> Assume and (max) 1k elements per block. Means we have an array of 1000
> pointers, each to a block of 1000 ptrs. Inserting means a binsearch in the
> first array to locate the block, and then there are two cases:
>
> 1 there is slack in the block: insert into the block, on avg
> (1000*sizeof(pointer)/2) bytes are moved.
> 2 the block is full, you have to insert a new block in the toplevel index,
> ( avg 1000*sizeof(pointer)/2) and then redivide the block between the two
> (again about 1000*sizeof(pointer)/2 roughly.
>
> If you do the splitting smart, 1 is way more common than 2.
>
> So that is 1k*sizeof(pointer)/2 vs 1M*sizeof(pointer)/2.
It's what I thought, yes.
>
> > Anyway, it could be a nice idea to implement as TLargeStringList.
>
> Yes, but I'd get rid of the string splitting stuff etc. IMHO they don't
> belong in the same class.
Why, they make coding much easier ?
I found myself having 10-20 custom functions, all accepting a TStrings + some args.
The coding is easier if it's just a method of TStrings.
But there is a way around that; Encapsulation.
> > > > The correct procedure IMHO is
> > > > - Set capacity
> > > > - Load
> > > > - Sort
> > >
> > > > I tested such things with an N^3 algorithm for my daytime job, and the
> > > > difference is very noticeable.
> > >
> > > With a single array or a multilevel one?
> >
> > Multilevel.
>
> Then it is not really tstringlist. Or do you mean tstringlists as major
> index with tstringlists under each node? That is IMHO already a custom
> class.
I don't consider that a custom class, I didn't have to declare extra classes for it :-)
>
> > > > All this doesn't exclude that a specialized class may be more suitable for
> > > > the job.
> > >
> > > To be honest, the only good point of TStringList seems to be that it is
> > > default available on all Object Pascal. The specialised stuff (splitting
> > > strings) is also plagued with oddities. (most notably the fact that
> > > characters under space are always used as separator)
> >
> > Why do you call this oddities ? For GUI programming (and that's what it was
> > implemented for) it makes perfect sense.
>
> The problem is that it is used for way more. It is container type, splitter,
> GUI data backstore etc. One can argue about how it is was meant originally,
> but this is the practice now. And a lot of code scales bad because of it.
Ok, that means we just have to separate out the container type.
>
> > > > I just want to illustrate that, if programmed correctly, TList,
> > > > TStringList and friends can still get you a long way...
> > >
> > > I think that the lengths to which people will go to stick to them paints
> > > what is needed to make a serious effort to make them legacy.
> >
> > I fail to see why, but no doubt you have your reasons. I'd like to see a complete
> > list of 'issues' with TStrings (not TStringList, that's just a specific implementation).
>
> I think we would have to agree first on the target of TStrings and -list.
> But the main reason is simply bad scaling. I want to be able to use a
> container type, and not first wrap it, or risk scaling issues later.
>
> A language/RTL should provide such type, and not position a type originally
> meant for GUI bindings with internal limits as basetype for datastructures.
>
> Not lots of new machines have memories that easily allow such large
> datastructures.
>
> The end conclusion can also be we need to promote e.g. contnrs types more. My
> remark here was more the backwards delphi compat (e.g. FPC extensions should
> not be in contnrs, but in a different unit that also compiles on Delphi)
I understand.
>
> > Maybe we can then create a TFPStrings which would be interface compatible,
> > (in the sense that it can replace TStrings without source changes. It can
> > have internal differences, like the space handling) but with a more sensible
> > behaviour ?
>
> IMHO
> - linesplitting and container function should be separated.
You can do that using encapsulation, see TList/TFPList.
Tlist adds the notification mechanism, which I've never
used in any code I wrote, but which exists in Delphi
and slows things down, as there is no way to switch it off.
I would say a TFPStringList/TExtStringList where TFPStringList only
provides storage, and TExtStringList provides all the rest, but uses
TFPStringList for storage.
(names are not meant as binding, just examples to show the point)
> - GUI and non GUI interfacing (data structures) should be separated.
I'm afraid I don't understand what you mean with this ?
> - There should be default containers that are not vectors (there are some in
> contnrs). IOW more java style iterators.
I proposed exactly that in the discussion about Generics.
>
> > > There should be a set of container classes in a separate unit (a unit not
> > > existing in Delphi most notably) that is Open Source and works on Delphi
> > > too.
> >
> > Like Decal was set up to be ?
>
> More or less yes, with one big problem here. Decal abused interfaces and
> variants as a way around missing generics, causing larger performance
> problems than it solved.
>
> However generics support in FPC won't solve it either, since then TStringList and TList will
> still be the lowest common denomitor. And a lot of programmers need to keep
> Delphi compat. Some coordination to have some congruency between generic
> and not generic types might be wise.
>
> > I'm all for starting such a thing. I would be the first one to use such a beast if it was better.
> > It was the prime reason for implementing TFPlist.
>
> ...Which is also in a unit that you can't simply USE in Delphi.
It isn't meant to be.
I want people to move to FPC because it actually provides better algorithms,
while still enabling them to compile their old Delphi code. We must have some
unique selling proposition.
Like a well-known man once said: the goal is world domination. ;-)
Seriously: I'm all for providing better base classes.
As long as they're not generics based, it's fine with me.
Michael.
More information about the fpc-pascal
mailing list