View Single Post
Old 08-03-2012, 06:36 PM   #303
joolzt
Junior Member
joolzt began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2011
Location: Scotland
Device: Samsung Galaxy Tab
Quote:
Originally Posted by kiwidude View Post
@joolzt - thx for the donation btw.
No prob, wasn't intended as a bribe for a reply , was genuine thanks for saving me loads of time and speeding up removal of dupes.

Quote:
Originally Posted by kiwidude View Post
Re (2) Not something I can replicate here,...... and the highlighting mode (red slash across the blue bars on button next to saved searches) will get displayed as being turned off........
Hadn't noticed the button with blue bars before and hadn't known what it was for as I don't use most of the Calibre functionality, but I must have hit it by accident as #2 was fixed as soon as I clicked it. Duh! Sorry for wasting your time on that one.


Quote:
Originally Posted by kiwidude View Post
(1) The filename is completely irrelevant when it comes to identifying duplicates. Two files which have the same CRC using the SHA hash computed by this plugin most definitely are duplicates. This plugin can pickup books which have been incorrectly catalogued in a users library by differences in title/author which will result in a different filename, hence why it has no relevance to whether a book is considered a duplicate or not.
I realise the primary CRC search doesn't need to look at author or title, I just meant to use one of the other as a double check before deleting, but I note your point that your check is stronger than the CRC check in WhereIsIt! so I had another look at why I thought there were a few files marked as dupes that weren't.

I repeated the search. I set it to show all groups at once and to sort by the number of duplicates, so I assumed that the results would show groups of duplicate files together. So when I saw this.....

author 1 series 1 title 1 size 0.1
author 1 series 1 title 1 size 0.1
author 2 series 2 title 2 size 8.0
author 3 series 3 title 3 size 6.6
author 3 series 3 title 3 size 6.6

... and realised that item 3 couldn't possibly match those before or after, so I jumped to the conclusion that a few files were being picked up as dupes in error. Most of my huge list of dupes (99%+) appeared together in groups which were clearly sets of duplicates, even if there were punctuation differences or missing series names, so these out of order files made me wary.

Now that I can browse 'author 2' correctly, I see that there are definitely two 'author 2 series 2 title 2 size 8.0' books, but they must just be sorted in different places in the original list.

I understand that there may be multiple duplicate matches for a book record with several formats in it, but the third file above was a pdf that didn't relate to those it was sorted with, so it got my systems analysis nose twitching.

I'd hoped that, once my dupes are down to a manageable level, I'd be able to sort dupes in groups and skim through looking for anomalies (as I did above) before searching again with the auto delete on.

My assumption that the list is sorted in groups must be wrong, but it's still a great tool.

Of course, I'm still wary of an autodelete based on CRC. If I have identical books but with different authors and titles due to an error and they are sorted apart from each other on the list then I wouldn't notice different title/author in the same 'dupe group' and a computer couldn't decide which was correct, so I come back to my original point that having an optional secondary check before an auto delete might be a good thing.

Then again, once I have few enough dupes to show and process one group at a time this becomes irrelevant, it's just that I have far too many dupes at the moment to do that, which is why I am using 'show all' and saving to disc in batches and using WhereIsIt! for bulk deduping. Your plug-in is still a great help as it picks up all the probable dupes for me to check, and once my lib is clean I can maintain it just with the plug-in.

Thanks for creating this plug-in, it's a great help.
joolzt is offline   Reply With Quote