Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 01-22-2016, 10:38 AM   #1
Notjohn
mostly an observer
Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.Notjohn ought to be getting tired of karma fortunes by now.
 
Posts: 1,515
Karma: 987654
Join Date: Dec 2012
Device: Kindle
Possible to get a word count in Calibre?

I have an epub consisting of 35 files. Is there any way to get a word count of the book in Calibre (or Calibre Editor)?

Never mind -- I found it! https://www.mobileread.com/forums/sho...d.php?t=242335

Thanks!

Last edited by Notjohn; 01-22-2016 at 10:53 AM.
Notjohn is offline   Reply With Quote
Old 01-22-2016, 10:48 AM   #2
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
You could have a look at the Count Pages plugin which also does Wordcount and some other metrics.
jackie_w is offline   Reply With Quote
Advert
Old 01-22-2016, 12:56 PM   #3
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
@Notjohn - you can get word counts from the Sigil and Calibre book editors, they're in Tools->Reports. In Sigil they're in the "HTML" report, in Calibre they're in the "Words" report

The Count Pages PI needs a calibre library in which to store the results

BR
BetterRed is online now   Reply With Quote
Old 01-22-2016, 01:02 PM   #4
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
He knows -- he posted the same question in the Sigil forum precisely two minutes before he double-posted to this forum.

https://www.mobileread.com/forums/sho...d.php?t=270111
eschwartz is offline   Reply With Quote
Old 01-22-2016, 02:23 PM   #5
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by eschwartz View Post
He knows -- he posted the same question in the Sigil forum precisely two minutes before he double-posted to this forum.

https://www.mobileread.com/forums/sho...d.php?t=270111
<sigh>

Same book (English)

Code:
EPUB - Count Pages    = 20,563
EPUB - Calibre Editor = 21,405
EPUB - Sigil          = 21,382

RTF/DOCX - Word       = 20,751
IMO a 4.1% spread is quite a lot, it's a sample of one so it could be an outlier.

BTW that's the current official release of Count Pages

BR
BetterRed is online now   Reply With Quote
Advert
Old 01-22-2016, 04:09 PM   #6
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by BetterRed View Post
<sigh>

Same book (English)

Code:
EPUB - Count Pages    = 20,563
EPUB - Calibre Editor = 21,405
EPUB - Sigil          = 21,382

RTF/DOCX - Word       = 20,751
IMO a 4.1% spread is quite a lot, it's a sample of one so it could be an outlier.

BTW that's the current official release of Count Pages

BR
BR
What count method did you set in Count Pages?
theducks is offline   Reply With Quote
Old 01-22-2016, 04:56 PM   #7
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by theducks View Post
BR - What count method did you set in Count Pages?
Whatever this is

Click image for larger version

Name:	Capture.JPG
Views:	1202
Size:	57.0 KB
ID:	145765

It was a 'new' book (one of today's intake), consequently the epub had copious quantities of crud. After delousing, the editor counts changed somewhat, so they must count crud (spans, superfluous styles, etc)

Deloused counts are in parentheses

Code:
Same book 

EPUB - Count Pages    = 20,563 (20,886)
EPUB - Calibre Editor = 21,405 (20,793)
EPUB - Sigil          = 21,382 (20,753)

RTF/DOCX - Word       = 20,751 (20,749)

TXT - Notepad++       =        (20,751)
So, if you're of the view that Word is the arbiter of good all things under the sun... then the winner is Sigil

You know my view on this - near enough, is good enough.

BR

Last edited by BetterRed; 01-22-2016 at 05:43 PM. Reason: Added count for TXT
BetterRed is online now   Reply With Quote
Old 01-22-2016, 05:30 PM   #8
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 74,025
Karma: 129333114
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by BetterRed View Post
<sigh>

Same book (English)

Code:
EPUB - Count Pages    = 20,563
EPUB - Calibre Editor = 21,405
EPUB - Sigil          = 21,382

RTF/DOCX - Word       = 20,751
IMO a 4.1% spread is quite a lot, it's a sample of one so it could be an outlier.

BTW that's the current official release of Count Pages

BR
Are you using the new Count Pages beta that uses a more accurate routine to count words? Check the Count Pages thread. It's in there.
JSWolf is online now   Reply With Quote
Old 01-22-2016, 05:55 PM   #9
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by JSWolf View Post
Are you using the new Count Pages beta that uses a more accurate routine to count words? Check the Count Pages thread. It's in there.
Quote:
Originally Posted by BetterRed View Post
BTW that's the current official release of Count Pages
Which syllable of 'official release' don't you understand

I will look at the new version if and when its released. If it does not have the option of retaining the current algorithm, irrespective of the fact that according to you it is flawed, I shall not be installing it. For my purposes consistency trumps any puritan notion of accuracy.

BR
BetterRed is online now   Reply With Quote
Old 01-23-2016, 01:53 AM   #10
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by BetterRed View Post
Whatever this is


Code:
Same book 

EPUB - Count Pages    = 20,563 (20,886)
EPUB - Calibre Editor = 21,405 (20,793)
EPUB - Sigil          = 21,382 (20,753)

RTF/DOCX - Word       = 20,751 (20,749)

TXT - Notepad++       =        (20,751)
That's even more fun. While looking at how Count Pages did the counting, I forgot to look at what it counted. They are each counting different things.

Based on a little experimenting and reading calibre code:

- calibre editor counts book text, text in the alt and title attributes of tags, text in the metadata and in the title tags of each internal file.
- Sigil counts the book text and the title tags.
- Count Pages just counts the book text.

The other reason for the difference is of course what they consider to be a word. Based on the some DLLs included with Sigil, I think it uses the same ICU method that the calibre editor uses.
davidfor is offline   Reply With Quote
Old 01-23-2016, 04:02 AM   #11
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by davidfor View Post
That's even more fun. While looking at how Count Pages did the counting, I forgot to look at what it counted. They are each counting different things.
@davidfor - did you find any counting of angels on pins

The contentious issue is usually hyphenated words - one word or many, and if many what to do about 'not in dictionary' parts, and the ignoring of SHY.

Calibre's spell checker also checks the spelling in metadata. So if your book is written in EN-US and you're using an EN-US dictionary, but the Comments contain reviews from the LRB and The Torygraph you're likely to get misspelt words emanating from Comments. Would anyone markup the some of the Comments as EN-GB - assuming one can.

IMO, by default, only the substantial sections of the work should be counted and spell checked, not the scaffolding that glues it together or the marketing blurb.

I should point out that the 'book' in the counts in my earlier posts had no front or back matter, index, bibliography, or notes - it was an essay.

I just did some similar counts on a real book (Bankers New Clothes, Admati and Hellwig) which has lots of notes, has some tables and graphs, and a bibliography, and a long index. Sigil reports 163,081, Calibre Editor reports 163,959, and 'current official' Count Pages reports 173,007. How does that accord with your research.

BR
BetterRed is online now   Reply With Quote
Old 01-23-2016, 06:17 AM   #12
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Quote:
Originally Posted by BetterRed View Post
@davidfor - did you find any counting of angels on pins

Quote:
The contentious issue is usually hyphenated words - one word or many, and if many what to do about 'not in dictionary' parts, and the ignoring of SHY.
To my mind, the point of the hyphen is to join thing into one word. Hence, it should always be counted as one. When I was experimenting, the problem of what was a hyphen was as much as a problem as anything else.
Quote:
Calibre's spell checker also checks the spelling in metadata. So if your book is written in EN-US and you're using an EN-US dictionary, but the Comments contain reviews from the LRB and The Torygraph you're likely to get misspelt words emanating from Comments. Would anyone markup the some of the Comments as EN-GB - assuming one can.

IMO, by default, only the substantial sections of the work should be counted and spell checked, not the scaffolding that glues it together or the marketing blurb.
I want to check the spelling on the comments in the metadata, but I don't want them counted as part of the word count. Same goes for the ToC in an epub.
Quote:
I should point out that the 'book' in the counts in my earlier posts had no front or back matter, index, bibliography, or notes - it was an essay.
Yes, and apparently there is a way to mark this in epub3. But, I've never seen a book that did it.
Quote:
I just did some similar counts on a real book (Bankers New Clothes, Admati and Hellwig) which has lots of notes, has some tables and graphs, and a bibliography, and a long index. Sigil reports 163,081, Calibre Editor reports 163,959, and 'current official' Count Pages reports 173,007. How does that accord with your research.
The calibre editor and Sigil difference will be the metadata and any alt attributes of images. An extra 880 words is probably right there. But, 10,000 extra with Count Pages is surprising. They are usually a lot closer. I'm curious, so I've just grabbed the preview from Kobo to test. That has up to chapter 2, so hopefully it will have enough to see what the difference are.
davidfor is offline   Reply With Quote
Old 01-23-2016, 07:02 AM   #13
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,579
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Joining two words with a hyphen is done for different reasons, from Hyphens | Punctuation Rules

Quote:
Rule 5. Never hesitate to add a hyphen if it solves a possible problem. [The following is an example of a well-advised hyphen]:

Confusing: Springfield has little town charm.
With hyphen: Springfield has little-town charm.

Without the hyphen, the sentence seems to say that Springfield is a dreary place. With the hyphen, little-town becomes a compound adjective, making the writer's intention clear: Springfield is a charming small town.

To my mind both those sentences have 5 words. If one was being paid on word count, why should the writer who is savvy enough to insert the 'well-advised' hyphen be penalised; or to put it the other way why should the writer without the wit to do so not be punished - 5 lashes afore the foremast, or a keel hauling at least.

I would count thirty-five, and self-obsessed as one word.

BR

Last edited by BetterRed; 01-23-2016 at 07:06 AM.
BetterRed is online now   Reply With Quote
Old 01-23-2016, 07:17 AM   #14
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,809
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
BR
Didn't Word count (paid) used to be really Character count (not whitespace char) divided by N?

That was before we had markup.
So wouldn't converting to plain Text (removes all tags) and subtracting all the 'Spaces',Tabs, LF/CR get you 'just the letters'?
theducks is offline   Reply With Quote
Old 01-23-2016, 07:35 AM   #15
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
BR: At the start of that article, it says:

Code:
Hyphens' main purpose is to glue words together.
Whenever I glue two things together, it is to make a single thing. Hence, the hyphen glues two words together to be one word.

And the devil in me wants to mention that your example is for a "compound adjective". Doesn't that mean the non-hyphenated version should be counted as a single word? Yeah, I'm stretching, but, what the hell

Anyway, the big problem is that without a very complete dictionary and correctly handling the grammar, there is no way to decide between the two. For simplicity, you have to decide that a hyphen either a word delimiter or part of the word.
davidfor is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Word Count/Unique Words thesn00ze Editor 7 04-18-2019 06:36 AM
Sigil word count? Notjohn Sigil 6 01-23-2016 04:59 AM
Word Count for Each TOC entry Zorg707 Editor 1 12-10-2015 03:32 PM
word count Tanjamuse Editor 5 11-09-2014 06:31 AM
Word Count leebase Calibre 34 06-07-2011 11:53 PM


All times are GMT -4. The time now is 07:10 PM.


MobileRead.com is a privately owned, operated and funded community.