Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 08-03-2012, 03:02 PM   #301
joolzt
Junior Member
joolzt began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2011
Location: Scotland
Device: Samsung Galaxy Tab
Hope I'm in the right place, I posted a bug fix for the duplicate finder plug-in but in the wrong place (on the Calibre support site) and the 'plug in forum thread' link from user plug-ins didn't bring me here so I did a manual forum search.

Please redirect me if I'm still in the wrong place :-( Otherwise I hope someone can help. Here's my post:

Great tool, saved me a lot of time, but two points:

1 - The binary check is not quite 100%

I don't think it's safe to have a checkbox for 'automatically remove duplicates' as you could be deleting books that aren't duplicate. The plug-in selects different books as 'duplicates' about 0.1 percent or less times, but I don't want to delete any books that are not genuine duplicates so I use this to identify possible dupes but still save books to disc and use WhereIsIt! to dedupe them as it lets you include a check on file name as well as CRC. I think your plug in info says you guarantee duplicates are found, you should change the wording so people know.

Suggestion: If your 'automatically delete' function was set to automatically delete only if the book title or author are the same' then that would probably reduce the chance of errors to an infinitesmal amount and I would use it to auto delete. You can easily do a manual review of the 'duplicates' that weren't auto deleted automatically and fix book name/author errors then run the tool again.

2 - Bug in 'clear duplicate results'

I've used your tool several times and it found loads of duplicates, cleared the results ok then did another search. Unfortunately the 'clear duplicate results' doesn't seem to work any more so I can't use Calibre normally now. I cleared the duplicate results but when I click on an author or series I see every single book in the window and not just the author. If I do an author search via the command line I get the same thing.

Attempts to fix: I shut down/restarted Calibre, updated the plug in, updated Calibre, but still the same. There must be something in the background settings that still thinks I want duplicates found. Reluctant to completely uninstall and reinstall Calibre and hope it won't come to this. Just tried to disable the plugin in 'user plugins' to see if that would clear it and put Calibre back to normal but says it can't be disabled :-(

Using Windows Vista. Calibre 8.62. I can provide screen caps etc.

Hope you can fix as this plug-in is a great idea.
joolzt is offline   Reply With Quote
Old 08-03-2012, 03:49 PM   #302
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,606
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@joolzt - thx for the donation btw.

(1) The filename is completely irrelevant when it comes to identifying duplicates. Two files which have the same CRC using the SHA hash computed by this plugin most definitely are duplicates. This plugin can pickup books which have been incorrectly catalogued in a users library by differences in title/author which will result in a different filename, hence why it has no relevance to whether a book is considered a duplicate or not.

Re (2) Not something I can replicate here, every time I clear regardless of whether I show one group at a time or all groups at once. You can either hit Escape, click on the clear search button next to Go, or click on "Clear duplicate results" on the Find Duplicates menu. In all those circumstances the search restriction gets cleared and the highlighting mode (red slash across the blue bars on button next to saved searches) will get displayed as being turned off. That is *assuming* you had it turned off before you had find duplicates mode (as it restores whatever "state" you were in prior to using the plugin). So if you had a search restriction or highlighting mode turned on before you entered find duplicates, that is exactly what will get restored when you clear out of it.
kiwidude is offline   Reply With Quote
Advert
Old 08-03-2012, 06:36 PM   #303
joolzt
Junior Member
joolzt began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Aug 2011
Location: Scotland
Device: Samsung Galaxy Tab
Quote:
Originally Posted by kiwidude View Post
@joolzt - thx for the donation btw.
No prob, wasn't intended as a bribe for a reply , was genuine thanks for saving me loads of time and speeding up removal of dupes.

Quote:
Originally Posted by kiwidude View Post
Re (2) Not something I can replicate here,...... and the highlighting mode (red slash across the blue bars on button next to saved searches) will get displayed as being turned off........
Hadn't noticed the button with blue bars before and hadn't known what it was for as I don't use most of the Calibre functionality, but I must have hit it by accident as #2 was fixed as soon as I clicked it. Duh! Sorry for wasting your time on that one.


Quote:
Originally Posted by kiwidude View Post
(1) The filename is completely irrelevant when it comes to identifying duplicates. Two files which have the same CRC using the SHA hash computed by this plugin most definitely are duplicates. This plugin can pickup books which have been incorrectly catalogued in a users library by differences in title/author which will result in a different filename, hence why it has no relevance to whether a book is considered a duplicate or not.
I realise the primary CRC search doesn't need to look at author or title, I just meant to use one of the other as a double check before deleting, but I note your point that your check is stronger than the CRC check in WhereIsIt! so I had another look at why I thought there were a few files marked as dupes that weren't.

I repeated the search. I set it to show all groups at once and to sort by the number of duplicates, so I assumed that the results would show groups of duplicate files together. So when I saw this.....

author 1 series 1 title 1 size 0.1
author 1 series 1 title 1 size 0.1
author 2 series 2 title 2 size 8.0
author 3 series 3 title 3 size 6.6
author 3 series 3 title 3 size 6.6

... and realised that item 3 couldn't possibly match those before or after, so I jumped to the conclusion that a few files were being picked up as dupes in error. Most of my huge list of dupes (99%+) appeared together in groups which were clearly sets of duplicates, even if there were punctuation differences or missing series names, so these out of order files made me wary.

Now that I can browse 'author 2' correctly, I see that there are definitely two 'author 2 series 2 title 2 size 8.0' books, but they must just be sorted in different places in the original list.

I understand that there may be multiple duplicate matches for a book record with several formats in it, but the third file above was a pdf that didn't relate to those it was sorted with, so it got my systems analysis nose twitching.

I'd hoped that, once my dupes are down to a manageable level, I'd be able to sort dupes in groups and skim through looking for anomalies (as I did above) before searching again with the auto delete on.

My assumption that the list is sorted in groups must be wrong, but it's still a great tool.

Of course, I'm still wary of an autodelete based on CRC. If I have identical books but with different authors and titles due to an error and they are sorted apart from each other on the list then I wouldn't notice different title/author in the same 'dupe group' and a computer couldn't decide which was correct, so I come back to my original point that having an optional secondary check before an auto delete might be a good thing.

Then again, once I have few enough dupes to show and process one group at a time this becomes irrelevant, it's just that I have far too many dupes at the moment to do that, which is why I am using 'show all' and saving to disc in batches and using WhereIsIt! for bulk deduping. Your plug-in is still a great help as it picks up all the probable dupes for me to check, and once my lib is clean I can maintain it just with the plug-in.

Thanks for creating this plug-in, it's a great help.
joolzt is offline   Reply With Quote
Old 08-03-2012, 07:05 PM   #304
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,606
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
Hey I don't mind if it was a bribe . It sure beats people who come to the threads and *demand* that an xyz feature or change must be made...

Glad to hear you got #2 sorted. As for the deletion, please take *no notice* of the "size" column. It is an utterly meaningless column (I don't ever bother displaying it) unless you only ever store one book format. It is only ever going to show you the size of the largest format, and even that is only at a "point in time" - editing the files using external tools like Sigil will make that number out of date.

This plugin compares the exact file size in bytes of the physical file, and only if those match does it then do the next step of computing and comparing an SHA hash. As I mentioned above - when it says you have a binary duplicate, it really *is* a duplicate.

The auto-delete function simply removes one of those binary copies, it doesn't touch your book records in calibre, so you lose zero data. I only added the feature as a convenience for users for two reasons:

(1) Since it is 100% safe to remove the duplicate file, it automates something that users otherwise expend a lot of the effort of one by one going through to do.

(2) It is impossible in the calibre GUI to show *which* format is the binary duplicate, in the scenario where both book records have multiple formats the same. Which means the users is left confused trying to work out which format it is safe for them to delete.

So... since it is 100% safe to delete them, I really don't think adding another dialog in there is necessary. That checkbox option to delete them is turned off by default in the plugin, but there is absolutely no downside to turning it on.

Anyways, enough on this. If you still aren't convinced, simply uncheck the option
kiwidude is offline   Reply With Quote
Old 08-04-2012, 10:33 PM   #305
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,460
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by odinokij View Post
Hello and thanks for your work,

I'd like to point a possible improvement for the Find Duplicates plugin: After doing a "Find library duplicates" you end up with a log of duplicate books. It would be nice if those duplicated books remained selected so we could delete them easily.

It's just a hint, I really would appreciate it.

Thank you very much.

Odinokij.
@odinokij - but which library, the library you are in (Source) or the library you selected for comparison (Target).

I use the save list feature which I print I use a pen to strike out what I've done.

One idea I had was move the duplicates from the two libraries to a third library. Then when you've resolved the conflicts - you move what's left to where you want it. For me that would be a 'nice to have'.

Over to kiwidude

BR

Last edited by BetterRed; 08-05-2012 at 02:43 AM. Reason: clarity
BetterRed is offline   Reply With Quote
Advert
Old 08-06-2012, 02:59 AM   #306
odinokij
Enthusiast
odinokij began at the beginning.
 
Posts: 29
Karma: 10
Join Date: Jul 2012
Device: Kindle 3
In my case, I have my "full library" and another one for "massive imports". When I download a new collection of books, I add them to the Import library. Then I want to check duplicates against my full library, so I can delete the duplicated ones of the import library and finally insert in the full library those books that rest in the import library.
So, in my case, it would be enough the easiest solution, that is to keep selected the duplicated books of the open library.
odinokij is offline   Reply With Quote
Old 08-11-2012, 07:38 AM   #307
bigbird1227
Enthusiast
bigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavens
 
Posts: 42
Karma: 13798
Join Date: Feb 2011
Device: kindle 3
find Unique

I was wondering how hard it would be to get a plugin that is the reverse of find duplicates. In other words one to compare libraries for unique books.

I love the last enhancement allowing you to find duplicates between libraries and would love to be able to find unique books when also comparing libraries

Many thanks for the great work done so far
bigbird1227 is offline   Reply With Quote
Old 08-11-2012, 07:41 AM   #308
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by bigbird1227 View Post
would love to be able to find unique books when also comparing libraries
I think you need to clarify what you mean by unique. I have 10,000 unique books in my library.
DoctorOhh is offline   Reply With Quote
Old 08-12-2012, 04:04 AM   #309
bigbird1227
Enthusiast
bigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavens
 
Posts: 42
Karma: 13798
Join Date: Feb 2011
Device: kindle 3
Quote:
Originally Posted by dwanthny View Post
I think you need to clarify what you mean by unique. I have 10,000 unique books in my library.
What I mean is I want to be able to compare libraries and identify unique books in the current library that are not in the library being compared to.

That way if I am looking at a 4,500 book library that is being compared against my 25,000 book library it will tell me which books I don't have rather than which are duplicates. In this case the duplicate numbers would probably be about 4,200 and the books I don't have around 300. This plugin would make it easier to identify the 300
bigbird1227 is offline   Reply With Quote
Old 08-13-2012, 06:32 AM   #310
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,606
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
v1.5.3 Released

Changes in this release:
  • When using "Find library duplicates" display all duplicate matches for the current library as marked:duplicate (except for author only duplicates)
Note that this displaying of results for cross library duplicates of course only shows you duplicates in the current library. So don't get yourself confused if you plan on deleting the results and remove them from the wrong one

It will display for all search types except for "Ignore Title" (author based) searches. I couldn't see the point in displaying all books for an author in that circumstance.

Re the query about showing books that re not duplicates. After running this check now, you will see marked:library_duplicates in the search bar. If you then type "not marked:library_duplicates" this will give you all the books that are not duplicates according to whatever duplicates criteria you searched on. I'm not entirely sure of the use case but the simple fact is you can do it if you need to.
kiwidude is offline   Reply With Quote
Old 08-14-2012, 04:24 AM   #311
odinokij
Enthusiast
odinokij began at the beginning.
 
Posts: 29
Karma: 10
Join Date: Jul 2012
Device: Kindle 3
Thanks a lot kiwidude
odinokij is offline   Reply With Quote
Old 08-17-2012, 08:15 AM   #312
bigbird1227
Enthusiast
bigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavensbigbird1227 is a rising star in the heavens
 
Posts: 42
Karma: 13798
Join Date: Feb 2011
Device: kindle 3
Quote:
Originally Posted by kiwidude View Post
Changes in this release:
  • When using "Find library duplicates" display all duplicate matches for the current library as marked:duplicate (except for author only duplicates)
Note that this displaying of results for cross library duplicates of course only shows you duplicates in the current library. So don't get yourself confused if you plan on deleting the results and remove them from the wrong one

It will display for all search types except for "Ignore Title" (author based) searches. I couldn't see the point in displaying all books for an author in that circumstance.

Re the query about showing books that re not duplicates. After running this check now, you will see marked:library_duplicates in the search bar. If you then type "not marked:library_duplicates" this will give you all the books that are not duplicates according to whatever duplicates criteria you searched on. I'm not entirely sure of the use case but the simple fact is you can do it if you need to.
Great work. Here's how I use it. I get a collection of books from someone that has 5000 books in it. I know that I probably have 4500 of them already. Its a lot easier for me to identify the 500 i don't have then get rid of the 4500 i do have. However since you revised the program to actually mark the duplicate books it really doesn't matter that much

Thanks again
bigbird1227 is offline   Reply With Quote
Old 08-23-2012, 08:31 PM   #313
chis
Junior Member
chis began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jul 2012
Device: Kindle Touch
Kiwidude

Thanks for all the awesome plugins! You've made a huge contribution to Calibre usability and helped so many of us.

Thanks for the ability for showing books in one library that are missing from another. I was about to ask if that was possible.

Suggestion for the Metadata Variations screen (which is excellent for bulk updates of new libraries): When looking at Author variations, enable any of the shown variations to be easily taken as the Rename To name for them all. Some of my authors appear in 3 or more forms. An excellent enhancement would enable clicking on any of the alternate names on the right hand list to copy that name down as the Rename To name.

Last edited by chis; 08-24-2012 at 12:26 AM.
chis is offline   Reply With Quote
Old 08-24-2012, 05:23 AM   #314
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,606
Karma: 2092290
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@chis,

Thanks for the kind words and glad the plugins have been useful to you.

The problem with the selection idea is that there is already a purpose for clicking on the right-side - it controls the ability to select/deselect items. As it is possible that you don't want to rename all of the variations that it found.

The way I saw people could handle the "this is not the name you were looking for" issue is that if you scroll down the list on the left hand side and find that variation there. All the permutations are available in that list.

I can't think of an alternative way just at the moment, though if someone has a suggestion I will consider it.

Enjoy the plugins.
kiwidude is offline   Reply With Quote
Old 09-09-2012, 07:40 PM   #315
BelgarionNL
Member
BelgarionNL began at the beginning.
 
Posts: 21
Karma: 10
Join Date: Jul 2011
Device: Sony PRS 650
please tell me there is a way to say: DELETE ALL DUPLICATE FOUND!

I am currently have 1 library with around 40 procent duplicates...
they are 1 on 1 duplicates from an old library!! how do I remove the copies?
finding it is great but I am not deleting them 1 by 1


if there is way to remove the duplicates PLEASE tell me!

thx

Last edited by BelgarionNL; 09-09-2012 at 08:10 PM.
BelgarionNL is offline   Reply With Quote
Reply

Tags
cross library duplicates, in library duplicates

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Quality Check kiwidude Plugins 1171 03-23-2024 05:18 AM
[GUI Plugin] View Manager kiwidude Plugins 413 03-17-2024 12:01 AM
[GUI Plugin] Open With kiwidude Plugins 402 03-16-2024 11:44 PM
[GUI Plugin] Generate Cover kiwidude Plugins 811 03-16-2024 11:31 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 03:15 AM.


MobileRead.com is a privately owned, operated and funded community.