Look at the word count calibre plugin, that will show you how to extract text from books of any format in calibre.
The problem with using a paragraph is once again one of identification. The algorithm is going to come up with a "signature" for the book, that signature has to be calculated independently against every instance of the book. How are you going to ensure that the algorithm picks the same paragrpah in every instance of the book? IOW, you algorithm picks paragraph number 23 in the epub version of the book and sends it as the signature to the server. Now the algorithm is running on another computer, where it has no access to what happened on the first computer, how will it know to pick the same paragraph for the same book to send the signature to the server?
|