View Single Post
Old 02-15-2023, 10:09 AM   #5
sgmoore
Zealot
sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.sgmoore ought to be getting tired of karma fortunes by now.
 
Posts: 138
Karma: 642206
Join Date: Mar 2021
Device: Kindle Voyage
Quote:
Originally Posted by jackie_w View Post
I don't know whether it's better or faster but the calibre plugin 'Count Pages' contains some code for extracting book text into a big string. It uses it when calculating a wordcount for the book.
It's definitely faster (a quick one off test shows that spawning ebook-convert is about five times slower).

Unfortunately it is not better and indeed not good enough. I have some files which look like they have been generated as epub files by Microsoft Word, and the count_pages algorithm produces text which is about four times larger than ebook-convert. (A quick glance shows thousands of font-family entries which have not been removed by count_pages).
sgmoore is offline   Reply With Quote