Quote:
Originally Posted by JSWolf
Under Word count options there is a check for Use ICU algorithm for counting words.
ICU on
Code:
Page count: 255.0
Word count using icu_wordcount - trying to count_words
Word count - used count_words: 94052
Word count: 94052
ICU off
Code:
Page count: 255.0
Word count using older method - trying to count_words
Word count: 96049
I think that's a pretty big difference. I can use the scramble plugin to scramble it and then attach it if you'd like.
|
Jon: I never would have worked out from your original post that you were talking about the difference between the ICU based word count and the original algorithm.
In any case, go back to the discussion in this thread at this time last year. You were the one that started that "discussion" by pointing out a possible bug. And that was the point of adding the ICU method as it is seemed to handle some things in a better way and was language aware. Back then, I did post explanations of some of the differences if you want to look.
Also, both methods rely on code in calibre. If that is updated, then it might change the count the plugin produces.
Personally, I expect both numbers to be wrong. I tend to think the ICU method is the more accurate, but that is based on me counting very small samples. I take all the statistics produced by the plugin as approximations. And during last year's discussion I was very tempted to introduce a "nearest 1000" option. Of course, that would raise the argument of rounding vs truncating.