MobileRead Forums - View Single Post

cryzed · 06-30-2017, 03:35 PM

Hey, I modified the code a bit. My plan was to speed up the APNX-accurate algorithm, but unfortunately even my alternative version only performs at around the same speed (+/- 1 second). Some things it might handle better are books with strange and/or broken markup (although Calibre should prevent that during the conversion-step).

I also made slight changes in _read_epub_contents() and _extract_body_text() that avoid some unicode conversion steps and that use the re module a bit more efficiently (regex_object.search() instead regex_object.findall(), re.sub() instead of str.replace()).

I didn't want to just throw it away, maybe you'll find it interesting or useful in some way: count-pages.patch.

Thanks for your plugin!

06-30-2017, 03:35 PM	#1120
cryzed Evangelist Posts: 408 Karma: 1050547 Join Date: Mar 2011 Device: Kindle Oasis 2	Hey, I modified the code a bit. My plan was to speed up the APNX-accurate algorithm, but unfortunately even my alternative version only performs at around the same speed (+/- 1 second). Some things it might handle better are books with strange and/or broken markup (although Calibre should prevent that during the conversion-step). I also made slight changes in _read_epub_contents() and _extract_body_text() that avoid some unicode conversion steps and that use the re module a bit more efficiently (regex_object.search() instead regex_object.findall(), re.sub() instead of str.replace()). I didn't want to just throw it away, maybe you'll find it interesting or useful in some way: count-pages.patch. Thanks for your plugin!