MobileRead Forums - View Single Post

kumbaja · 08-13-2010, 09:44 AM

I've taken a look at the code today, too and I can kind of confirm your initial assessment with the exception that I think you are missing one step.

Quote:

* Results from isbndb and google books are merged by isbn. The first database queried takes precedence in case of conflict.
* Title is an exact match (although code comments imply titles starting with a -match)
* books with coverart
* books with longer descriptions

IMO there is a big problem in the current implementation. For one the merge is somewhat problematic, because potentially good data from a second data source gets dropped without any inspection.
Second, the way the title comparison works is that it only considers one title better than another title if and only if it is equal to the search query title (sans some common stop words and case-insensitive). If two results titles are both different than the queried title they are both considered "equally bad" and only the cover art and description lenght is then used for sorting.
In other words if there are no results with an exact title match then the remaining results are ordered by cover art and description regardless of title (and author and publisher btw).
IMO we should consider some type of distance metric for title matching such as Levenshtein distance, jaccard similarity or even TF/IDF (see: http://www.dcs.shef.ac.uk/~sam/stringmetrics.html)