MobileRead Forums - View Single Post - Duplicate Detection for "Add Books" is too weak

nurbles62 · 03-27-2020, 12:48 PM

Quote:

Originally Posted by kovidgoyal

Every time you add a book the entire library has to be scanned for duplicates, this is waaaaay to slow for large libraries, if the algorithm is made more flexible. Duplicate detection is not going to be made stronger. Simply add the duplicates and use the duplicate finder plugin if you need better algorithms.

May I ask why it is so slow? All of the strings that need to be checked appear to already be in memory (the title and authors are in the list of books in the library, after all) and a binary search for the two fields would not seem to particularly slow, since uniques should fall out quickly. Would adding an option to include the author(s) in the compare really make it much slower -- after all, you would only even GET to the author compare AFTER the title was found to be a duplicate.

For reference, I've been a programmer for a little over 40 years and I need to do something similar [I think] fairly often, and as long as everything's in memory it can be done pretty quickly. In fact, by sorting the test cases, too, quite a lot of the initial searching can also be minimized.