View Single Post
Old 12-04-2019, 08:13 PM   #1302
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by keturn View Post
I've got an epub file here that took 115 minutes to count 1.84 million words.

With "ICU" un-checked, it takes about five seconds. (and the result is about 9% different.)

Is that expected behavior, or should we do some debugging?
I have to go with "wow". I happened to have a 1.4 million word book and it took 57 minutes on my work machine with ICU selected and 9 seconds without. I would expect the ICU method to be slower, but, I wasn't expecting that. I think it is something new as I did run the count on a 6 million word book a month or so ago, and I'm sure I was using the ICU count. Both the counts are done using methods built into calibre. I'll have to check an older version to see if it is different.

As to the difference in the result, that is to do with how the two algorithms define a word. This has been "discussed" in this thread a few times. No one won.
davidfor is offline   Reply With Quote