Beta - Change method used for word count
I wasn't going to do this, but, Kovid added an extra method and the count wasn't taking into account the language of the book, so...
Attached is a beta version of the plugin that uses the ICU Word Iterator to do the count. This does the word count using the language set in the book. If this is not set, it will default to English. For older versions of calibre that do not include the appropriate methods for the ICU Word Iterator, the word count will use the older method.
But, for this beta, the count is actually being done twice. The old method is always done and printed to the job log. Then it attempts to use the new method. I have done this to get an idea of how much difference there is for the two methods. For English, the difference is small enough that it doesn't bother me. But, for other languages, it might be more. I don't have enough non-English test books to check. I would be interested to know if there is a significant difference for any language.
To view the two counts, you need to open the job list (click "Jobs" in the bottom right of the calibre window), select the count pages job and press the "Show job details".
If anyone finds a problem, please report it. If none are found, and there are no objections to changing the word count algorithm, I will arrange to release this sometime next week.
Last edited by davidfor; 01-08-2016 at 12:10 AM.
Reason: Fixed the file name. Same contents, but correctly named
|