@Kovid and @davidfor,
I've been working my way through my Editor plugins and I think I've found something a bit odd with the
count_words function in
calibre.spell.break_iterator in calibre 4.99.2.
I sometimes use count_words in my Editor plugins before & after a plugin's mass cleanups to sound an immediate alarm if text content accidentally got removed. It seems to run very much slower in 4.99.2 than in 4.8 and I wonder if you can shed any light on why that might be?
@davidfor, I know count_words is also an integral part of the
Count Pages plugin.
I ran some tests to illustrate the problem. I selected 4 books (2 long, 1 medium, 1 short) in EPUB2 format. They all validate clean in both calibre CheckBook and EpubCheck.
For each EPUB I counted the words (python script attached in spoiler below) using 3 different versions of calibre in debug mode:
- Win 64bit 4.99.2
- Win 32bit 4.8
- Win 64bit 4.99.2 run from source (fully up-to-date)
These are the results. As you can see v4.99.2 runs 10 to 35 times slower than v4.8:
Code:
1. Alexandre Dumas - The Count of Monte Cristo
DEBUG: 0.0 Start: calibre: 4.99.2 [64bit]; ispy3: True
DEBUG: 7.4 End: Wordcount: 496791
DEBUG: 0.0 Start: calibre: 4.8; ispy3: False
DEBUG: 0.7 End: Wordcount: 496791
DEBUG: 0.0 Start: calibre: 4.99.2* [64bit]; ispy3: True
DEBUG: 7.4 End: Wordcount: 496791
2. Peter F Hamilton - The Naked God
DEBUG: 0.0 Start: calibre: 4.99.2 [64bit]; ispy3: True
DEBUG: 21.9 End: Wordcount: 455220
DEBUG: 0.0 Start: calibre: 4.8; ispy3: False
DEBUG: 0.6 End: Wordcount: 455220
DEBUG: 0.0 Start: calibre: 4.99.2* [64bit]; ispy3: True
DEBUG: 22.0 End: Wordcount: 455220
3. EF Benson - Mapp and Lucia
DEBUG: 0.0 Start: calibre: 4.99.2 [64bit]; ispy3: True
DEBUG: 2.8 End: Wordcount: 114695
DEBUG: 0.0 Start: calibre: 4.8; ispy3: False
DEBUG: 0.2 End: Wordcount: 114695
DEBUG: 0.0 Start: calibre: 4.99.2* [64bit]; ispy3: True
DEBUG: 2.9 End: Wordcount: 114695
4. Evelyn Waugh - Vile Bodies
DEBUG: 0.0 Start: calibre: 4.99.2 [64bit]; ispy3: True
DEBUG: 1.3 End: Wordcount: 76836
DEBUG: 0.0 Start: calibre: 4.8; ispy3: False
DEBUG: 0.1 End: Wordcount: 76836
DEBUG: 0.0 Start: calibre: 4.99.2* [64bit]; ispy3: True
DEBUG: 1.3 End: Wordcount: 76836
This was the simple script I used: