Quote:
Originally Posted by jorm
Seems that the content of the book is the only truly unique way to associated your copy of the book with mine as the same book if we can't both find the isbn.
|
The problem is, that you can't always map contents to the container (here a book).
Speaking mathematically the mapping of containers to contents is a
surjective function and is generally not reversible, i.e. the container/book is not always distinct:
With the sentences approach you could identify a closed unit (story, romance, poem), but not what container (book/anthology/collection) it is in, as the same story can be contained in more than one book.
To identify a container one have to consider the hash values of all items in it (that's how the hash of e.g. Java's
List is computed). The problem is: How can you split a container's content into it's elements? Perhaps there would be always a blank page as separator between the items, but maybe not always. Also you can't know a priori, if it is a collection of different stories or a collection of chapters belonging to the same story. I think, it would be better to process somehow the TOC.
There could also be "foreign content" in a book, like quotes or proverbs. So taking a sentence might lead you to a different book identified.