Setting up some kind of book fingerprint algorithm would be an interesting challenge. Off the top of my head, you could use:
Set of all proper nouns (defined as words with the first letter capitalized that are not at the start of a sentence). There would need to be some metric over the space of such sets that allows for close but not perfect matches.
I dont think you would have much success with a random sentence, as picking the same sentence in different formats of the books will be difficult, for example, the MOBI format could have a table of contents embedded at the begining, or a calibre conversion of the book could have an embedded metadata jacket.
|