01-22-2016, 10:38 AM | #1 |
mostly an observer
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
|
Possible to get a word count in Calibre?
I have an epub consisting of 35 files. Is there any way to get a word count of the book in Calibre (or Calibre Editor)?
Never mind -- I found it! https://www.mobileread.com/forums/sho...d.php?t=242335 Thanks! Last edited by Notjohn; 01-22-2016 at 10:53 AM. |
01-22-2016, 10:48 AM | #2 |
Grand Sorcerer
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
You could have a look at the Count Pages plugin which also does Wordcount and some other metrics.
|
Advert | |
|
01-22-2016, 12:56 PM | #3 |
null operator (he/him)
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
@Notjohn - you can get word counts from the Sigil and Calibre book editors, they're in Tools->Reports. In Sigil they're in the "HTML" report, in Calibre they're in the "Words" report
The Count Pages PI needs a calibre library in which to store the results BR |
01-22-2016, 01:02 PM | #4 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
He knows -- he posted the same question in the Sigil forum precisely two minutes before he double-posted to this forum.
https://www.mobileread.com/forums/sho...d.php?t=270111 |
01-22-2016, 02:23 PM | #5 | |
null operator (he/him)
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
Same book (English) Code:
EPUB - Count Pages = 20,563
EPUB - Calibre Editor = 21,405
EPUB - Sigil = 21,382
RTF/DOCX - Word = 20,751
BTW that's the current official release of Count Pages BR |
|
Advert | |
|
01-22-2016, 04:09 PM | #6 | |
Well trained by Cats
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
What count method did you set in Count Pages? |
|
01-22-2016, 04:56 PM | #7 |
null operator (he/him)
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Whatever this is
It was a 'new' book (one of today's intake), consequently the epub had copious quantities of crud. After delousing, the editor counts changed somewhat, so they must count crud (spans, superfluous styles, etc) Deloused counts are in parentheses Code:
Same book
EPUB - Count Pages = 20,563 (20,886)
EPUB - Calibre Editor = 21,405 (20,793)
EPUB - Sigil = 21,382 (20,753)
RTF/DOCX - Word = 20,751 (20,749)
TXT - Notepad++ = (20,751)
You know my view on this - near enough, is good enough. BR Last edited by BetterRed; 01-22-2016 at 05:43 PM. Reason: Added count for TXT |
01-22-2016, 05:30 PM | #8 | |
Resident Curmudgeon
Posts: 74,025
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
01-22-2016, 05:55 PM | #9 | |
null operator (he/him)
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
I will look at the new version if and when its released. If it does not have the option of retaining the current algorithm, irrespective of the fact that according to you it is flawed, I shall not be installing it. For my purposes consistency trumps any puritan notion of accuracy. BR |
|
01-23-2016, 01:53 AM | #10 | |
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Based on a little experimenting and reading calibre code: - calibre editor counts book text, text in the alt and title attributes of tags, text in the metadata and in the title tags of each internal file. - Sigil counts the book text and the title tags. - Count Pages just counts the book text. The other reason for the difference is of course what they consider to be a word. Based on the some DLLs included with Sigil, I think it uses the same ICU method that the calibre editor uses. |
|
01-23-2016, 04:02 AM | #11 | |
null operator (he/him)
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
The contentious issue is usually hyphenated words - one word or many, and if many what to do about 'not in dictionary' parts, and the ignoring of SHY. Calibre's spell checker also checks the spelling in metadata. So if your book is written in EN-US and you're using an EN-US dictionary, but the Comments contain reviews from the LRB and The Torygraph you're likely to get misspelt words emanating from Comments. Would anyone markup the some of the Comments as EN-GB - assuming one can. IMO, by default, only the substantial sections of the work should be counted and spell checked, not the scaffolding that glues it together or the marketing blurb. I should point out that the 'book' in the counts in my earlier posts had no front or back matter, index, bibliography, or notes - it was an essay. I just did some similar counts on a real book (Bankers New Clothes, Admati and Hellwig) which has lots of notes, has some tables and graphs, and a bibliography, and a long index. Sigil reports 163,081, Calibre Editor reports 163,959, and 'current official' Count Pages reports 173,007. How does that accord with your research. BR |
|
01-23-2016, 06:17 AM | #12 | ||||
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
Quote:
Quote:
Quote:
Quote:
|
||||
01-23-2016, 07:02 AM | #13 | |
null operator (he/him)
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Joining two words with a hyphen is done for different reasons, from Hyphens | Punctuation Rules
Quote:
I would count thirty-five, and self-obsessed as one word. BR Last edited by BetterRed; 01-23-2016 at 07:06 AM. |
|
01-23-2016, 07:17 AM | #14 |
Well trained by Cats
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
BR
Didn't Word count (paid) used to be really Character count (not whitespace char) divided by N? That was before we had markup. So wouldn't converting to plain Text (removes all tags) and subtracting all the 'Spaces',Tabs, LF/CR get you 'just the letters'? |
01-23-2016, 07:35 AM | #15 |
Grand Sorcerer
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
|
BR: At the start of that article, it says:
Code:
Hyphens' main purpose is to glue words together. And the devil in me wants to mention that your example is for a "compound adjective". Doesn't that mean the non-hyphenated version should be counted as a single word? Yeah, I'm stretching, but, what the hell Anyway, the big problem is that without a very complete dictionary and correctly handling the grammar, there is no way to decide between the two. For simplicity, you have to decide that a hyphen either a word delimiter or part of the word. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Word Count/Unique Words | thesn00ze | Editor | 7 | 04-18-2019 06:36 AM |
Sigil word count? | Notjohn | Sigil | 6 | 01-23-2016 04:59 AM |
Word Count for Each TOC entry | Zorg707 | Editor | 1 | 12-10-2015 03:32 PM |
word count | Tanjamuse | Editor | 5 | 11-09-2014 06:31 AM |
Word Count | leebase | Calibre | 34 | 06-07-2011 11:53 PM |