View Single Post
Old 03-13-2012, 11:22 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,400
Karma: 27756918
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Setting up some kind of book fingerprint algorithm would be an interesting challenge. Off the top of my head, you could use:

Set of all proper nouns (defined as words with the first letter capitalized that are not at the start of a sentence). There would need to be some metric over the space of such sets that allows for close but not perfect matches.

I dont think you would have much success with a random sentence, as picking the same sentence in different formats of the books will be difficult, for example, the MOBI format could have a table of contents embedded at the begining, or a calibre conversion of the book could have an embedded metadata jacket.
kovidgoyal is offline   Reply With Quote