[fpc-pascal] sorting and merging array of records

Tomas Hajny XHajT03 at hajny.biz
Thu Jan 12 13:20:51 CET 2012


On Thu, January 12, 2012 03:34, waldo kitty wrote:
> On 1/11/2012 20:35, Tomas Hajny wrote:
>> On 11 Jan 12, at 17:46, waldo kitty wrote:
>>   .
>>   .
>>> 1. right now the compare is working on the catalog number
>>> (TTLERec.catnbr) and
>>> with duplicates:=FALSE there are no duplicates... however, i need to be
>>> able to
>>> choose which record to keep when there is a duplicate catnbr but the
>>> epoch
>>> (TTLERec.epoch) is different... right now it is throwing out all but
>>> the
>>> first... how can i tell it how to decide which one to throw away? i saw
>>> earlier
>>> that one of the parent objects has neat functions like AtPut which
>>> would easily
>>> allow me to overwrite an existing record that's too old... i just don't
>>> know if
>>> i can use that in the middle of an insert or a search or just exactly
>>> where i
>>> would even put code to do this...
>>
>> The best solution is probably overriding the Insert method to fit
>> your slightly modified logic. Copy the default implementation of
>> TSortedCollection.Insert as found in objects.pp in FPC source tree
>> and modify it according your needs (if Search returns true, perform
>> the additional check for the epoch and depending on the result either
>> use AtInsert (equally to the default implementation) or Dispose the
>> previously stored record at position I ("I" returned by Search, "At
>> (I)" used to access the previously stored record) and then use AtPut.
>
> i've been mulling this over for a few hours and i'm wondering if
> overriding the
> insert method or simply doing the lofic in my importation routine would be
> best... right now my code blindly inserts the record but i could do a
> search for
> a matching "key" (catnbr by keyof ??) and then if i find a match, look at
> the
> epoch of the two records, one in vars and the other in the collection...
> if the
> one in the vars is newer, then do an AtPut to overwrite the one currently
> stored... otherwise, just go on to read the next file record into the vars
> and
> check it from there...

Yes, you can certainly do that. The Insert method does both the Search
call and checking the Duplicates field already. If you do it on your side
(rather than overriding the Insert method), the same calls will be
performed twice in some cases (in particular if the inserting is really
necessary). However, both approaches are certainly possible.


> without looking at the "code to copy" if i want to override the insert
> method,
> it almost seems that there's a bug if it just throws away the record we're
> trying to insert... it would seem that if the code locates a "duplicate"
> record,
> it would properly dispose of unwanted data... unless i perform the

No. There are two tasks. One is dynamic allocation of the object, the
other is insertion. Although you perform both tasks on one line in your
program, these are two distinct tasks. The insert code cannot know whether
you still may need the (previously allocated) object or not in the very
general case. The insert code doesn't throw anything away - you do it by
not storing the result of the "New (PTLERec, Init (..." to some variable
and only send it as a parameter to a method which may or may not store it
somewhere else.


> previously
> mentioned logic checks manually to catch this, i don't see this
> happening... the
> number of records left in memory at the end of the program's execution is
> the
> same number as those not added because they are "dupes"...

There are certainly different approaches possible. Yes, you could e.g.
have a static object (allocated on stack or as global data), populate it
with the values read from the file first, search the collection using this
static object and only allocate a new one dynamically if you need to
insert it.


>>> 2. something else i'm running into is with duplicates:=FALSE, there's a
>>> whole
>>> bucket load of records that are not disposed of when i wipe out the
>>> collection... heaptrc hollers right nasty to me about'em on exit... i
>>> can only
>>> assume that these are the duplicates but i don't understand why they
>>> are still
>>> hanging around if insert or add threw them away already :/
>>
>> If you already had them in the collection, they're not added again.
>> You only dispose records added to the collection at the end, but
>> these are lost this way. You can also sort this out in the overridden
>> Insert method (if you don't want to use the newly created record,
>> dispose it).
>
> as above, i don't know that they are /in/ the collection... i'm
> (currently)
> simply calling the insert method and leaving the work up to it... if it
> should
> be handling this gracefully, it isn't... at least not in a way that
> heaptrc likes ;)

It behaves as specified; the responsibility for disposing the duplicates
is on your side (since you allocate them in your code also).


>>> [TRIM]
>>>>> data^.catnbr := Copy(data^.satdata[1],3,5);
>>>>> data^.epoch := Real_Value(data^.satdata[1],19,14);
>>>>> inc(sat_cnt);
>>>>> aTLECollection^.insert(data);
>>>>> dispose(data);
>>>>
>>>> Don't do this! You'll free the memory you allocated for your record.
>>>> The
>>>> collection will only contain a pointer to this data! (Many of the
>>>> rules I
>>>> mentioned for T(FP)List apply here as well)
>>>
>>> uh? for some reason i thought that the insert was copying the data to a
>>> place
>>> and then setting the pointer to there for the collection... i tried
>>> with and
>>> without the dispose(data) but it still looked the same... of course i
>>> was tired
>>> and might not have been looking at the right writeln debug output...
>>   .
>>   .
>>
>> No, Insert doesn't do any copying by default. In your current code,
>> the copying is performed by calling the PTLERec constructor (in
>> "New(PTLERec, Init(...") but if the pointer isn't inserted, it is
>> thrown away currently.
>
> thrown away by the default object insert code? why doesn't it properly
> wipe it
> out since it is tossing the pointer into the bitbucket?

Hopefully clarified above (you throw it away by not storing the pointer at
the allocation time).


>> If you use the overridden Insert method as
>> suggested above, you can Dispose it from within the Insert call if
>> you don't need to insert it.
>
> understood... and with an eye on the above discussion about doing the
> logic
> myself using the available routines, i could also take care of it there by
> deciding to AtPut, or, insert or dispose instead of blindly performing an
> insert
> and hoping for the best??

Yes - it's up to you to decide how to structure the task.


> i'll probably have broken my code by the time you read this...  but i'll
> very
> likely be attempting to implement the logic in my Input_Satellite_List
> routine
> ;) OB-)

Possible, but likely resulting in some useless overhead (computing
performance-wise) if you still intend to use the method Insert in that
case.


> i just gotta figure out the way that the search and such work as in if
> they are
> returning pointers to the record or the record data itself... pointers are
> still
> very alien to me :? :/

Understood - powerful, but the power brings some complexity indeed.

Tomas





More information about the fpc-pascal mailing list