[fpc-pascal] TSortedCollection dupes ordering

Frederic Da Vitoria davitofrg at gmail.com
Thu Feb 6 10:23:04 CET 2014


2014-02-06 waldo kitty <wkitty42 at windstream.net>:

> On 2/5/2014 3:57 AM, Frederic Da Vitoria wrote:
> [...]
>
>  Once again I did not test this, but it seems to me that if Compare
>> returned -1
>> instead of 0, any duplicate would be inserted after because it would
>> never be
>> considered as equal to any other. But since you still want your
>> collection to be
>> able to choose between skipping duplicates or keeping them, the Compare
>> modification would have to be slightly more subtle, something like:
>>
>> if result = 0 and Duplicates
>>      then result := -1
>>
>> at the end of the Compare function.
>>
>
> i tried this and it kinda works... it keeps the entries in their original
> order but...
>
> 1. the final logging of each item's "record number" (position in the
> collection) is -1
> 2. it doesn't help to sort them in order by a different field
>
> i'm unsure what to do or how to handle this so that there is a secondary
> (sub) sorting order so that the main key is the master sort and then a
> secondary ""key"" is used when duplicates are allowed... ideally, the
> secondary key would retain the original order in the case that the
> ""secondary key"" is exactly the same as a previous ""secondary key""...
> but this is also problematic...
>
> to try to clarify: sometimes there are records released with exactly the
> same time stamp (epoch in my code i posted) but slightly different data
> within the record... there is another field that might be used to
> differentiate those BUT the records come from numerous locations... they
> may or may not use this ""tertiary key" and if they do, their numbering in
> this ""tertiary key"" may not be the same as any other system's count for
> this ""tertiary key""... this is a problem i don't know how to solve as
> there is no coordination between locations and no ""master"" coordinator
> for this ""tertiary key""... it becomes even more apparent because my flow
> doesn't take any certain record containers before any others... they are
> read and processed as they appear (OS ordering actually) which may cause
> ""newer"" records with an identical time stamp to be processed after
> others... in my current design, i'm using "first come, first served"
> meaning that the first record processed is the one that is retained... with
> dupes, this doesn't matter so much but it does all still come into play...


Then my trick does not work for you because it hides the fact that the
records are identical. You need to give to someone the responsibility of
giving the secondary key. If I understand what you wrote correctly, there
is already something which could be used as a tertiary key, but it doesn't
really work because the way it is filled is not consistent across the
different sources. If I were you, I'd keep this tertiary key data (I guess
it is meaningful, so you can't remove it), and I'd create my own secondary
key inside the TSortedCollection descendant. I'd use 2 compare functions,
 - one which works as your current one but wouldn't be declared as a
compare function (you could call it CheckPrimaryExists) and which would
return 0 if the primary key already exists
 - and one which uses both the primary and the secondary key as Jim
suggested.

The algo (when duplicates are allowed) would be something like:
if primary key exists
    then set secondary key to a number
insert the data

Note that depending on the total number of rows, you could use a general
counter for the secondary key, no need to fetch the value of the last
secondary key for the same primary key.

You'd get something like (primary key / secondary key)
A / 0
B / 0
B / 1 (duplicate detected, first secondary key generated)
C / 0
C / 2 (duplicate detected, second secondary key generated, note that there
is no C1)
...

... and now that I think of it, you don't need 2 compare functions, the
second one should work for both usages.

-- 
Frederic Da Vitoria
(davitof)

Membre de l'April - « promouvoir et défendre le logiciel libre » -
http://www.april.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.freepascal.org/pipermail/fpc-pascal/attachments/20140206/ac1ed35f/attachment.html>


More information about the fpc-pascal mailing list