This plugin will help you to identify duplicate authors, titles, formats, series, publishers, tags and identifiers in your Calibre libraries.
- Duplicate authors are where you have multiple variants of an author due to spacing, punctuation, spelling differences or word order. e.g. Kevin Anderson / Kevin J. Anderson / Keven Anderson / Anderson, Kevin / Anderson Kevin / Bloggs, Joe & Anderson, Kevin
- Duplicate titles are where you have multiple book entries with either the same or varying titles. e.g. Martian Way / The Martian Way / The Martian Way (2010) / The Martian Way and Other Stories
- Duplicate formats are where the contents of a particular format like ePub are binary identical to another in your library
The plugin offers a variety of matching algorithms for finding possible groups of duplicate candidates. Each algorithm combination provides a differing tradeoff of the number of genuine duplicates found versus the number of false positives (near duplicates).
When the search is complete the results of each group are presented to you to navigate through. You can then do one of three things:
- If the group contains genuine duplicates, use the existing Merge feature in the Edit metadata menu to resolve the duplicate book entries.
- If the group contains non duplicates, you can mark the group as exempt to prevent those books or authors from appearing together in future searches.
- Skip the group for now and just move to the next one, either deferring your decision or to mark all remaining groups as exemptions when finished.
New to version 1.4 is a "Find metadata variations" menu which allows you to find variations of author, publisher, series and tag names and rename directly on this dialog. Again a number of different matching algorithms are available for use.
Version 1.5 has added the ability to perform duplicate comparisons across multiple libraries. So for instance if you have a "working" library and a "main" library, you can search for duplicates between those libraries with the same range of algorithms and produce a report for later resolution.
Main Features of v1.6.1:
- Searches either your entire library or respecting any search restriction set at the time you Find Duplicates.
- Choose your desired combination of title and author matching from any of "identical", "similar", "soundex", "fuzzy" or "ignore" algorithms.
- Choose alternative algorithms such as matching identifiers or binary comparison.
- View the results either one group at a time, or showing all duplicate candidates at once using highlighting to show the groups.
- When doing author duplicate searches (ignore title), optionally highlight the authors under consideration in the tag browser for ease of renaming
- Sort the result groups either by title/author (default) or by the size of the group
- Fine tune the soundex algorithm options to make them "fuzzier" or more explicit matching.
- Optionally include the languages field when comparing titles, so intentionally using the same book title in different languages does not show as duplicates.
- Optionally have binary duplicate formats automatically removed from your library when doing a binary comparison.
- Mark the current group as exempt or all groups as exempt from appearing as duplicates again
- Review your duplicate exemptions with the opportunity to reverse the exemption allowing duplicate consideration again
- Exempt either individual books (title searches) or authors (author searches)
- Clicking the clear search button, setting a different restriction or choosing an explicit Clear duplicate results menu option will exit duplicate search mode.
- Switching libraries or restarting Calibre will also clear any duplicate search results. Your exemptions will be remember and are stored per library.
- Customize the keyboard shortcuts for a number of the menu options.
- Find metadata variations for authors, publishers, series and tags to eradicate unwanted duplicates with an alternative simplified UI to rename them.
- Find duplicates across multiple libraries, producing a report.
- When placed on the toolbar, clicking the toolbar button without duplicate groups displayed will display the Find Duplicates options dialog. When results are displayed, clicking on the button will move to the next result. Ctrl+click or shift+click to navigate to the previous result.
Suggested Workflow:Paypal Donations:
- Requires Calibre 0.8.59 or later.
- If you find this or any of my other plugins useful please feel free to show your appreciation. I have spent many hundreds of unpaid hours in their development and support so any encouragement for me to continue is appreciated!
Version 1.6.1 - 03 Jan 2013
Fix for when comparing library duplicates to ensure saved searches are not corrupted.
Version 1.6.0 - 29 Oct 2012
Change "ISBN Compare" to "Identifier" with a dropdown allowing comparison of any identifier field.
Add a context menu to the metadata variations list to allow choosing the selected name on the right side.
Version 1.5.3 - 14 Aug 2012
When using "Find library duplicates" display all duplicate matches for the current library as marked:duplicate (except for author duplicates)
Version 1.5.2 - 21 Jul 2012
When using "Find library duplicates" clear the current search in order to compare the entire restricted library
When using "Find metadata variations" and showing books, fire the search again to ensure results reflect the search
Version 1.5.1 - 21 Jul 2012
Add a "Save log" button for the "Find library duplicates" result screen.
Version 1.5.0 - 20 Jul 2012
Add a "Find library duplicates" option for cross-library duplicate comparisons into a log report
If currently running a duplicate book search and execute a metadata variation search, clear search first
Version 1.4.0 - 17 Jul 2012
Now requires calibre 0.8.59
Add a Find metadata variations option to search for author, series, publisher and tag variations, and allow renaming them from the dialog.
Fix bug of fuzzy author comparisons which will no longer compute a reverse hash to reduce the false positives it generated
Version 1.3.0 - 22 Jun 2012
Now requires calibre 0.8.57
Store configuration in the calibre database rather than a json file, to allow reuse from different computers (not simultaneously!)
Add a support option to the configuration dialog allowing viewing the plugin data stored in the database
Add an option to allow automatic removal of binary duplicates (does not delete books records, only the newest copies of that format).
Version 1.2.3 - 02 Dec 2011
Make the languages comparison optional (default false) via a checkbox on the Find Duplicates dialog
Version 1.2.2 - 25 Nov 2011
Take the languages field into account when doing title based duplicate comparisons
Version 1.2.1 - 12 Nov 2011
When selecting ISBN or Binary compare, hide the Title/Author groupbox options
Some cosmetic additions to the text for ISBN/Binary options
Version 1.2.0 - 11 Sep 2011
Fix bug for when switching to an ignore title search where author search was previously set to ignore.
Remove customisation of shortcuts on tab, to use Calibre's centrally managed shortcuts instead.
Version 1.1.4 - 04 Jul 2011
Additional fix for stuff broken by Calibre 0.8.8 in the tag view
Fix for removing an author exemption
Version 1.1.3 - 03 Jul 2011
Preparation for deprecation of db.format_abspath() for networked backend
Version 1.1.2 - 03 Jul 2011
Fix for issue with Calibre 0.8.8 tag browser search_restriction refactoring
Version 1.1.1 - 12 Jun 2011
Add van to list of ignored author words
Fix bug of error dialog not referenced correctly
Version 1.1 - 3 May 2011
Add support for binary comparison searches to find book formats with exactly the same content
Replace how exemptions are stored in the config file to make more scalable
No longer calculate exemption preview detailed messages for the confirmation dialog for performance
Compare multiple authors for most author algorithms to increase duplicate coverage.
Change Manage exemptions dialog to have tab for each author with exemptions and show section only if have exemptions
Include swapping author name order in all but identical author checks. So A B / B A or A,B / B,A will match.
Disable the Ignore title, identical author combination as will not a valid one (never duplicates)
Allow the remove, mark current and mark all group exemption dialogs able to be hidden from showing again.
Allow various count of result and no result information dialogs able to be hidden from showing again.
Allow user to reset confirmation dialogs related to find duplicates from the configuration dialog
Version 1.0 - 26 Apr 2011
Initial release of Find Duplicates plugin