Quote:
Originally Posted by jackie_w
I don't know whether it's better or faster but the calibre plugin 'Count Pages' contains some code for extracting book text into a big string. It uses it when calculating a wordcount for the book.
|
It's definitely faster (a quick one off test shows that spawning ebook-convert is about five times slower).
Unfortunately it is not better and indeed not good enough. I have some files which look like they have been generated as epub files by Microsoft Word, and the count_pages algorithm produces text which is about four times larger than ebook-convert. (A quick glance shows thousands of font-family entries which have not been removed by count_pages).