View Single Post
Old 03-13-2012, 12:48 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Look at the word count calibre plugin, that will show you how to extract text from books of any format in calibre.

The problem with using a paragraph is once again one of identification. The algorithm is going to come up with a "signature" for the book, that signature has to be calculated independently against every instance of the book. How are you going to ensure that the algorithm picks the same paragrpah in every instance of the book? IOW, you algorithm picks paragraph number 23 in the epub version of the book and sends it as the signature to the server. Now the algorithm is running on another computer, where it has no access to what happened on the first computer, how will it know to pick the same paragraph for the same book to send the signature to the server?
kovidgoyal is online now   Reply With Quote