Quote:
Originally Posted by keturn
I've got an epub file here that took 115 minutes to count 1.84 million words.
With "ICU" un-checked, it takes about five seconds. (and the result is about 9% different.)
Is that expected behavior, or should we do some debugging?
|
I have to go with "wow". I happened to have a 1.4 million word book and it took 57 minutes on my work machine with ICU selected and 9 seconds without. I would expect the ICU method to be slower, but, I wasn't expecting that. I think it is something new as I did run the count on a 6 million word book a month or so ago, and I'm sure I was using the ICU count. Both the counts are done using methods built into calibre. I'll have to check an older version to see if it is different.
As to the difference in the result, that is to do with how the two algorithms define a word. This has been "discussed" in this thread a few times. No one won.