MobileRead Forums - View Single Post

kovidgoyal · 03-13-2012, 11:22 AM

Setting up some kind of book fingerprint algorithm would be an interesting challenge. Off the top of my head, you could use:

Set of all proper nouns (defined as words with the first letter capitalized that are not at the start of a sentence). There would need to be some metric over the space of such sets that allows for close but not perfect matches.

I dont think you would have much success with a random sentence, as picking the same sentence in different formats of the books will be difficult, for example, the MOBI format could have a table of contents embedded at the begining, or a calibre conversion of the book could have an embedded metadata jacket.

03-13-2012, 11:22 AM	#2
kovidgoyal creator of calibre Posts: 46,067 Karma: 29579912 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Setting up some kind of book fingerprint algorithm would be an interesting challenge. Off the top of my head, you could use: Set of all proper nouns (defined as words with the first letter capitalized that are not at the start of a sentence). There would need to be some metric over the space of such sets that allows for close but not perfect matches. I dont think you would have much success with a random sentence, as picking the same sentence in different formats of the books will be difficult, for example, the MOBI format could have a table of contents embedded at the begining, or a calibre conversion of the book could have an embedded metadata jacket.