MobileRead Forums - View Single Post - Duplicate Detection for "Add Books" is too weak

kovidgoyal · 03-27-2020, 10:20 PM

Quote:

Originally Posted by nurbles62

May I ask why it is so slow? All of the strings that need to be checked appear to already be in memory (the title and authors are in the list of books in the library, after all) and a binary search for the two fields would not seem to particularly slow, since uniques should fall out quickly. Would adding an option to include the author(s) in the compare really make it much slower -- after all, you would only even GET to the author compare AFTER the title was found to be a duplicate.

For reference, I've been a programmer for a little over 40 years and I need to do something similar [I think] fairly often, and as long as everything's in memory it can be done pretty quickly. In fact, by sorting the test cases, too, quite a lot of the initial searching can also be minimized.

It's an O(n^2) algorithm vs an O(1) algorithm, via a hashmap of normalized title values. As a programmer of 40 yrs standing you should understand the consequences. And author names have too much variation to be able to perform a useful normalized O(1) check on them.