03-22-2020, 09:57 AM | #1 |
Junior Member
Posts: 9
Karma: 10
Join Date: Jun 2018
Device: Kindle for Samsung Tablets
|
Duplicate Detection for "Add Books" is too weak
calibre's duplicate detection when adding books is far, far too weak to ever allow it to automatically ignore or merge things it flags as duplicates. It appears to me that only the titles are considered, but I'm not convinced it is performing an exact match on the titles, either.
Regardless, far, far too many short titles are use by different authors for different books and "Add Books" always marks them INCORRECTLY as duplicates. I wish we could have at least a couple simple options to control this (so that folks who like it like this [if there are any]) can keep it: include author match would be the primary feature new option and would resolve most issues. To be fancier, ways to handle multiple authors, title patterns, and different actions for exact matches and "near"/"possible" patches would all be nice. Or maybe there is already a plug-in that repairs this functionality? Is that even possible? If the developer(s) reading this, please, PLEASE consider improving this feature! |
03-22-2020, 10:04 AM | #2 |
creator of calibre
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Every time you add a book the entire library has to be scanned for duplicates, this is waaaaay to slow for large libraries, if the algorithm is made more flexible. Duplicate detection is not going to be made stronger. Simply add the duplicates and use the duplicate finder plugin if you need better algorithms.
|
Advert | |
|
03-22-2020, 10:11 AM | #3 |
Well trained by Cats
Posts: 29,768
Karma: 54401244
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Things like this is why many of us use an INTAKE Library.
1) We beat the metadata into shape. Refine the tags. 2) Do any other cleaning tasks 3)Use the Find Duplicate Plugin and run the Find Library Duplicates option against the destination Library |
03-22-2020, 01:35 PM | #4 | |
Bibliophagist
Posts: 35,280
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Quote:
What theducks said. I suspect we've all had a new ebook where the embedded information from the publisher is something like Star Paths: A novel of space colonization by Herbert A Patrick and you find you already have a duplicate called Star Paths by Herbert A. Patrick after you clean up the metadata and run Find Duplicates. You could use the Find Duplicates plugin with options to find the book without cleaning up the metadata using it's various search options. |
|
03-22-2020, 04:20 PM | #5 | |
null operator (he/him)
Posts: 20,544
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
When a duplicate is found in step 3, you may want to see the book that exists in the destination library, i.e. its cover, metadata and format files. You can do that via the calibre-spy plugin which provides read-only access to calibre libraries. BR |
|
Advert | |
|
03-27-2020, 12:48 PM | #6 | |
Junior Member
Posts: 9
Karma: 10
Join Date: Jun 2018
Device: Kindle for Samsung Tablets
|
Quote:
For reference, I've been a programmer for a little over 40 years and I need to do something similar [I think] fairly often, and as long as everything's in memory it can be done pretty quickly. In fact, by sorting the test cases, too, quite a lot of the initial searching can also be minimized. |
|
03-27-2020, 03:55 PM | #7 |
Bibliophagist
Posts: 35,280
Karma: 145435140
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Forma, Clara HD, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
Sadly, using the author in comparing when adding books would be not all that useful given how often the authors names are gibbled. As an example, how would you handle an author whose name is Henry Beam Piper best known as H. Beam Piper when the ebook creator has used H Beam Piper, H. Beam Piper, Henry B Piper, Henry B. Piper, H B Piper, HB Piper, H. B. Piper and H.B. Piper. This is one of reasons that I import books into an intake library where I edit the metadata, check for duplicates and so forth before moving to my main library.
|
03-27-2020, 06:45 PM | #8 | |
null operator (he/him)
Posts: 20,544
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Kovid will always consider patches, or you could write a File Type plugin specific to your needs, they get used when a book is added. BR |
|
03-27-2020, 07:20 PM | #9 |
null operator (he/him)
Posts: 20,544
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
|
03-27-2020, 10:20 PM | #10 | |
creator of calibre
Posts: 43,835
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Quote:
|
|
Tags |
add books, duplicates |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Language "detection" when adding books | McGonigle | Library Management | 2 | 07-14-2014 06:26 AM |
Google seeks patent to add "triggered sounds" to e-books | Alexander Turcic | News | 51 | 09-27-2013 05:51 PM |
A warning for Linux users: slow "Add Books", "Unknown" title and Author | rolgiati | Library Management | 8 | 07-24-2013 04:36 PM |
Duplicate Books named "Unknown", Why created anyway & How to get rid off them safely? | KWhytte | Library Management | 10 | 09-01-2012 10:17 AM |
[Enhancement suggestion] Folders when save books in "Add Books" function | simonbcn | Calibre | 1 | 08-30-2009 12:59 PM |