Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 04-18-2011, 11:50 PM   #16
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
There is a word count function embedded inside Calibre, currently used internally for heuristics. With the new plugin framework it should be simple to create a user interface plugin for this. What would people want done with the information? I'm thinking populate a user configurable column might be the way to go.
ldolse is offline   Reply With Quote
Old 04-19-2011, 03:51 AM   #17
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by ldolse View Post
There is a word count function embedded inside Calibre, currently used internally for heuristics. With the new plugin framework it should be simple to create a user interface plugin for this. What would people want done with the information? I'm thinking populate a user configurable column might be the way to go.
Unless the value is precomputed for each book (book, not format), a custom column would be a performance catastrophe. Reason: touching any custom column for a book record causes the computation of them all. Sorting on something would compute the word count for all books in the library. Doing that for 10,000 books could take many minutes.
chaley is offline   Reply With Quote
Old 04-19-2011, 04:27 AM   #18
CCarrot
2 lzy 2 update this...
CCarrot began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Apr 2011
Device: jetbook
Quote:
Originally Posted by chaley View Post
Unless the value is precomputed for each book (book, not format), a custom column would be a performance catastrophe. Reason: touching any custom column for a book record causes the computation of them all. Sorting on something would compute the word count for all books in the library. Doing that for 10,000 books could take many minutes.
Yeeps, that's not good. Good brainstorming, though, thanks Idolse and chaley!

Is there any way to add a metadata field to hold the page or word count, then have it populate on either an add or a database check? It would extend the time for a db check, but that takes some time for 10,000 books anyways...and I wouldn't think it would add too much overhead to the add function, since people aren't likely to try adding that many books at once.

I'm not sure if adding a metadata field to a book is available to the average user, though, or if it would require a plugin. That's outside my expertise, i'm afraid...
CCarrot is offline   Reply With Quote
Old 04-19-2011, 04:49 AM   #19
Manichean
Wizard
Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.Manichean is the 'tall, dark, handsome stranger' all the fortune-tellers are referring to.
 
Manichean's Avatar
 
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
Quote:
Originally Posted by CCarrot View Post
I'm not sure if adding a metadata field to a book is available to the average user, though, or if it would require a plugin. That's outside my expertise, i'm afraid...
Custom column = custom book metadata field. Thus, the same performance consideration would apply, unless the plugin calculated the number once and stored it in a numerical column. That way, the calculation would be user initiated only.
Manichean is offline   Reply With Quote
Old 04-19-2011, 06:04 AM   #20
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by chaley View Post
Unless the value is precomputed for each book (book, not format), a custom column would be a performance catastrophe. Reason: touching any custom column for a book record causes the computation of them all. Sorting on something would compute the word count for all books in the library. Doing that for 10,000 books could take many minutes.
I was thinking the plugin would work this way:
  1. User highlights a list of books, selects 'Calculate Word Count'. (or whatever it's called)
  2. Grab of 'one' of the books formats (probably use the preferred format list to choose the best one)
  3. Get the word count, compute it, and populate the custom field/column (field chosen in plugin config).

In other words the user would need to manually populate the field by using the plugin - this would be the equivalent of what you mean by pre-computed, correct?

If the user did select several thousand books and ask to compute word count for them all this would indeed be a hit of many minutes, but I think it would be one time this way, right?

My problem might have been my use of the term 'custom column', not sure if that means a column that's dynamically populated to some people.

Last edited by ldolse; 04-19-2011 at 06:11 AM.
ldolse is offline   Reply With Quote
Old 04-19-2011, 06:17 AM   #21
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
@idolse: yes, that is what I mean by precomputed. If the plugin populates a integer (or float?) custom column, then the performance penalty during search & sort is close to zero.

As for how often things are computed: certainly doing a word count on thousands of books will take a while. You might consider storing the date the format was last changed (database2.format_last_modified) in plugin storage (use the per-book persistent data feature). You could then provide an 'update' function that compares the stored date with the format's current date, and recompute the count only if the format has changed.

You might also consider working through the formats in some order until you find one that is not DRM infested.
chaley is offline   Reply With Quote
Old 04-19-2011, 06:31 AM   #22
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by chaley View Post
@idolse: yes, that is what I mean by precomputed. If the plugin populates a integer (or float?) custom column, then the performance penalty during search & sort is close to zero.

As for how often things are computed: certainly doing a word count on thousands of books will take a while. You might consider storing the date the format was last changed (database2.format_last_modified) in plugin storage (use the per-book persistent data feature). You could then provide an 'update' function that compares the stored date with the format's current date, and recompute the count only if the format has changed.

You might also consider working through the formats in some order until you find one that is not DRM infested.
Cool, sounds like a plan then. I'll start simple and if it seems popular I'll look into those additional niceties.
ldolse is offline   Reply With Quote
Old 04-19-2011, 06:45 AM   #23
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 31,057
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
My 2 cents
What don't you count when you calculate the word count.
Front matter(copyright, books by author...)? Teasers for other books? Pages of 'Rave reviews'? Sample chapters? Glossary? Dramatis Persona?

I am trying to think of all the other stuff I frequently see other than the Story (Forward through Epilogue)

I guess I never worry about word count.
When I get to 'The End', I am done
theducks is offline   Reply With Quote
Old 04-19-2011, 06:55 AM   #24
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by theducks View Post
My 2 cents
What don't you count when you calculate the word count.
Front matter(copyright, books by author...)? Teasers for other books? Pages of 'Rave reviews'? Sample chapters? Glossary? Dramatis Persona?

I am trying to think of all the other stuff I frequently see other than the Story (Forward through Epilogue)

I guess I never worry about word count.
When I get to 'The End', I am done
I wasn't thinking about trying to get fancy and detect where the book actually starts. The way I do it in heuristics is just delete all the html tags with a regex (potentially error prone but faster than proper text conversion). Then I just count everything that's left. The way I'm thinking to do this is basically identical to the extract ISBN plugin's mechanism, so the regex aproach would actually be required, as the function the plugin relies on creates a book preview that's incompatible with the conversion engine.
ldolse is offline   Reply With Quote
Old 04-19-2011, 11:46 AM   #25
CCarrot
2 lzy 2 update this...
CCarrot began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Apr 2011
Device: jetbook
Quote:
Originally Posted by ldolse View Post
The way I do it in heuristics is just delete all the html tags with a regex (potentially error prone but faster than proper text conversion). Then I just count everything that's left.
That sounds like a sensible way to go about it. I know I'm not looking for an *exact* word count, just an idea of proportionally how 'long' the book is compared to other books (or short stories) in the collection. File size is misleading in this respect, since it includes the graphics which, while nice to have, don't impact the 'reading length' of the book.

If we had access to a stored word count for each book, that could be converted to approximate page count easily by dividing by 200 (or 500 or whatever the typical number of words on a page is, idk).

One quick question though: would the calculated word counts be persistent, or would they have to be re-calculated (at the user's request of course) each time the program starts up?

Thanks!
cc
CCarrot is offline   Reply With Quote
Old 04-19-2011, 11:56 AM   #26
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
It would persist as part of the books metadata.
ldolse is offline   Reply With Quote
Old 04-19-2011, 12:04 PM   #27
CCarrot
2 lzy 2 update this...
CCarrot began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Apr 2011
Device: jetbook
Quote:
Originally Posted by ldolse View Post
It would persist as part of the books metadata.
:
CCarrot is offline   Reply With Quote
Old 04-19-2011, 12:57 PM   #28
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,729
Karma: 2197770
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
@Idolse - if you are going to write this you might want to check with user_none as per this post since it sounds like both functions could be rolled into one.
kiwidude is offline   Reply With Quote
Old 04-19-2011, 01:03 PM   #29
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by kiwidude View Post
@Idolse - if you are going to write this you might want to check with user_none as per this post since it sounds like both functions could be rolled into one.
I'll check with user_none before I go to town on it then - I'm sort of thinking this could grow into a general statistics plugin with a number of stats that could be gathered.
ldolse is offline   Reply With Quote
Old 04-22-2011, 01:31 AM   #30
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
MS word used to compute a readability factor ( maybe still does but I've not looked). based IIRC on stuff like number of words per sentence.
anyways that stats generated by Word ( & how it computes them) is a possible starting point for compiling a wish list of useful? book info
cybmole is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Romance Ebers, Georg: A Word, Only a Word. V1. 20 Mar 2009 crutledge Kindle Books 0 03-20-2009 08:14 AM
Romance Ebers, Georg: A Word, Only a Word. V1. 20 Mar 2009 crutledge IMP Books 0 03-20-2009 08:12 AM
Romance Ebers, Georg: A Word, Only a Word. V1. 20 Mar 2009 crutledge BBeB/LRF Books 0 03-20-2009 08:10 AM
Romance Ebers, Georg: A Word, Only a Word. V1. 20 Mar 2009 crutledge ePub Books 0 03-20-2009 08:09 AM
Convert word DOCs when you don't have WORD ? heheh macthekitten Calibre 9 01-30-2009 07:41 AM


All times are GMT -4. The time now is 09:41 AM.


MobileRead.com is a privately owned, operated and funded community.