View Single Post
Old 01-23-2016, 01:53 AM   #10
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,905
Karma: 47303824
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by BetterRed View Post
Whatever this is


Code:
Same book 

EPUB - Count Pages    = 20,563 (20,886)
EPUB - Calibre Editor = 21,405 (20,793)
EPUB - Sigil          = 21,382 (20,753)

RTF/DOCX - Word       = 20,751 (20,749)

TXT - Notepad++       =        (20,751)
That's even more fun. While looking at how Count Pages did the counting, I forgot to look at what it counted. They are each counting different things.

Based on a little experimenting and reading calibre code:

- calibre editor counts book text, text in the alt and title attributes of tags, text in the metadata and in the title tags of each internal file.
- Sigil counts the book text and the title tags.
- Count Pages just counts the book text.

The other reason for the difference is of course what they consider to be a word. Based on the some DLLs included with Sigil, I think it uses the same ICU method that the calibre editor uses.
davidfor is offline   Reply With Quote