Hey, I modified the code a bit. My plan was to speed up the APNX-accurate algorithm, but unfortunately even my alternative version only performs at around the same speed (+/- 1 second). Some things it might handle better are books with strange and/or broken markup (although Calibre should prevent that during the conversion-step).
I also made slight changes in _read_epub_contents() and _extract_body_text() that avoid some unicode conversion steps and that use the re module a bit more efficiently (regex_object.search() instead regex_object.findall(), re.sub() instead of str.replace()).
I didn't want to just throw it away, maybe you'll find it interesting or useful in some way:
count-pages.patch.
Thanks for your plugin!