MobileRead Forums - View Single Post

kiwidude · 04-19-2011, 04:45 AM

Quote:

Originally Posted by Starson17

Thinking out loud here - Suppose I tell you that AuthorA and AuthorB are not the same, even though the algorithm sees them as similar. Can I then say anything about whether BookA by AuthorA and BookB by AuthorB are the same? I suppose not. Father and son write a book, but I've got format 1 under Father's name and Format 2 under the son's name.

It is a good question as to whether there is crossover from the author exemption list to the book find algorithms. My first instinct was to say the answer is that there should be. Your example if I understand it correctly is as the result of a metadata data entry error, as the book has been given the wrong author. It just so happens that you coincidentally may see it appearing in duplicate searches because father and son share a similar name.

So if I have Steve Smith and S. Smith as authors, and I decide that these are not duplicate authors from a duplicate author search. As I am displaying all books by those two authors at once before I make that exemption, that is my opportunity to make sure that any wrong author values on individual books between the two are rectified (this is where the Search the Internet plugin with fantastic fiction are gold to me).

Then if it happened to be the case that both authors had written a book with a title that is similar enough to appear in a duplicate search, you might argue that it should automatically be excluded, as you have already said the author sets are distinct.

However if we did this I see the potential issue of you adding another format for this book in future to your library where once again the author has the wrong value on it. Now you will never see it appear as a duplicate, unless you removed the author exclusions. That is a bit nasty and subtle.

Note that unless you run the 'xxx title, ignore author' book algorithms you are unlikely to have an overlap for the above scenario as it needs a more fuzzy author match which will only be offered for author based searches, not book ones. Similar author just does punctuation and comma name flipping. And there must be a relatively small % of books in the world which are written with an exact enough title match by different authors that have such subtly different author names. So I think it is safer to not apply the author exclusion list to book searches and let the user make book based exemptions instead. At least that way if they import books in future with the wrong name on they have a chance of picking that up from a duplicate book based search. Not if it is a new title of course but there is only so much we an do!