Originally Posted by kovidgoyal
Not really, it's a pretty simple problem, say you have 20000 books in your database, and are adding 100 new books
Then calibre has to make 20,000 * 100 comparisons to check for duplicates. That's presumably what's slowing it down.
Surely it need only make log2(20,000) * 100 comparisons, right? And you could optimize even that by hashing over whatever you are using for the comparison first. And...
Care to exchange a bit of email?