[fpc-pascal] Question on how to avoid memory trouble using FindFirst(), FindNext() and FindClose()
Marco van de Voort
marcov at stack.nl
Sat Feb 3 19:44:58 CET 2007
> On Sat, 3 Feb 2007, Marco van de Voort wrote:
>
> > > Yes it will, because the reallocations don't happen as often.
> > > The sorting introduces an overhead anyway, whether you set capacity or not.
> >
> > Yes, but I was talking about slowness in general, not just from the heap.
> >
> > And TStringList with those huge internal list has to move on avg half of the
> > array. If TStringList had an extra indirection (array of ptr to blk of ptrs)
> > it would be less so.
>
> Eh ? What happens if you do a Insert(0,'SomeString') ? Don't you need to move
> across all blocks ? Or will you just grow the first block ?
Assume 1M elements.
Then the tstringlist case is 1M*sizeof(pointer) bytes. Inserting a new node on avg moves
(1M*sizeof(pointer))/2 bytes. Overhead of index is only heapmgr overhead for the big
block.
Assume and (max) 1k elements per block. Means we have an array of 1000
pointers, each to a block of 1000 ptrs. Inserting means a binsearch in the
first array to locate the block, and then there are two cases:
1 there is slack in the block: insert into the block, on avg
(1000*sizeof(pointer)/2) bytes are moved.
2 the block is full, you have to insert a new block in the toplevel index,
( avg 1000*sizeof(pointer)/2) and then redivide the block between the two
(again about 1000*sizeof(pointer)/2 roughly.
If you do the splitting smart, 1 is way more common than 2.
So that is 1k*sizeof(pointer)/2 vs 1M*sizeof(pointer)/2.
> Anyway, it could be a nice idea to implement as TLargeStringList.
Yes, but I'd get rid of the string splitting stuff etc. IMHO they don't
belong in the same class.
> > > The correct procedure IMHO is
> > > - Set capacity
> > > - Load
> > > - Sort
> >
> > > I tested such things with an N^3 algorithm for my daytime job, and the
> > > difference is very noticeable.
> >
> > With a single array or a multilevel one?
>
> Multilevel.
Then it is not really tstringlist. Or do you mean tstringlists as major
index with tstringlists under each node? That is IMHO already a custom
class.
> > > All this doesn't exclude that a specialized class may be more suitable for
> > > the job.
> >
> > To be honest, the only good point of TStringList seems to be that it is
> > default available on all Object Pascal. The specialised stuff (splitting
> > strings) is also plagued with oddities. (most notably the fact that
> > characters under space are always used as separator)
>
> Why do you call this oddities ? For GUI programming (and that's what it was
> implemented for) it makes perfect sense.
The problem is that it is used for way more. It is container type, splitter,
GUI data backstore etc. One can argue about how it is was meant originally,
but this is the practice now. And a lot of code scales bad because of it.
> > > I just want to illustrate that, if programmed correctly, TList,
> > > TStringList and friends can still get you a long way...
> >
> > I think that the lengths to which people will go to stick to them paints
> > what is needed to make a serious effort to make them legacy.
>
> I fail to see why, but no doubt you have your reasons. I'd like to see a complete
> list of 'issues' with TStrings (not TStringList, that's just a specific implementation).
I think we would have to agree first on the target of TStrings and -list.
But the main reason is simply bad scaling. I want to be able to use a
container type, and not first wrap it, or risk scaling issues later.
A language/RTL should provide such type, and not position a type originally
meant for GUI bindings with internal limits as basetype for datastructures.
Not lots of new machines have memories that easily allow such large
datastructures.
The end conclusion can also be we need to promote e.g. contnrs types more. My
remark here was more the backwards delphi compat (e.g. FPC extensions should
not be in contnrs, but in a different unit that also compiles on Delphi)
> Maybe we can then create a TFPStrings which would be interface compatible,
> (in the sense that it can replace TStrings without source changes. It can
> have internal differences, like the space handling) but with a more sensible
> behaviour ?
IMHO
- linesplitting and container function should be separated.
- GUI and non GUI interfacing (data structures) should be separated.
- There should be default containers that are not vectors (there are some in
contnrs). IOW more java style iterators.
> > There should be a set of container classes in a separate unit (a unit not
> > existing in Delphi most notably) that is Open Source and works on Delphi
> > too.
>
> Like Decal was set up to be ?
More or less yes, with one big problem here. Decal abused interfaces and
variants as a way around missing generics, causing larger performance
problems than it solved.
However generics support in FPC won't solve it either, since then TStringList and TList will
still be the lowest common denomitor. And a lot of programmers need to keep
Delphi compat. Some coordination to have some congruency between generic
and not generic types might be wise.
> I'm all for starting such a thing. I would be the first one to use such a beast if it was better.
> It was the prime reason for implementing TFPlist.
...Which is also in a unit that you can't simply USE in Delphi.
More information about the fpc-pascal
mailing list