Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 05-24-2011, 04:50 AM   #16
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Remember that the viewer has the worst estimation of pages being horribly inaccurate as it uses the "adobe standard" of 1024 characters per page if I remember what user_none said correctly. So being 50% of that value is certainly not unusual and far more closer to the printed edition in most cases.

The whole intent of this plugin is an at a glance estiate in your view, so you can see a 1500 vs 750 vs 300 vs 20 page type of comparison. That the 1500 might actually be 1800 and 300 is really 200 in print isn't the relevant issue. I changed the ePub algorithm so that you would hopefully get roughly consistent results between mobi and ePub formats in case your library contains a mixture of not having one or the other.

The only way to get real world printed page values is to read them off a web page like in a metadata source plugin. I would like to do this one day but it requires non trivial changes to the metadata api in calibre which are more than I can take on right now.

Btw I know this plugin will throw an error if you try to count for a drm protected book. So use quality check to exclude those.
kiwidude is offline   Reply With Quote
Old 05-24-2011, 04:54 AM   #17
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
Any idea why these books show up as 807 and 590 while there is just a small difference in the content?
drMerry is offline   Reply With Quote
 
Enthusiast
Old 05-24-2011, 05:31 AM   #18
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@drMerry - What is similar to you is obviously not similar as far as the algorithm is concerned.

If one day user_none decides on a different or modified approach to calculating apnx files then I will follow suit.

The only alternative I have is to ditch his apnx calculation for mobi and do my own thing for both. The one thing I could do with that is offer a possibly more consistent calculation between books by stripping all HTML markup and then do a calculation based on that like the adobe one of dividing length. I know for performance reasons he constrained his calculation and was limited in his options. However the algorithm does use a lines per page calculation using the markup tags to indicate end of lines etc. Which works well sometimes and clearly not in others depending on how much markup there is and how many 70 character lines there are.

So at least currently a kindle user will get a similar result from the apnx file generated by calibre. If I "go rogue" and do my own calculation then counting the ebook viewer we will have a third interpretation of page count and none any more accurate than the last?
kiwidude is offline   Reply With Quote
Old 05-24-2011, 06:54 AM   #19
drMerry
Addict
drMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmosdrMerry has become one with the cosmos
 
drMerry's Avatar
 
Posts: 293
Karma: 21022
Join Date: Mar 2011
Location: NL
Device: Sony PRS-650
I did not knew you look at the source. That can be very different for the same book off-course (only think of a hand made website and a website exported by word can look similar but if you see the source....)

So I understand the basics of your calculation.
I love this plugin!!! Thanks!
drMerry is offline   Reply With Quote
Old 05-24-2011, 07:03 AM   #20
tilia
Addict
tilia ought to be getting tired of karma fortunes by now.tilia ought to be getting tired of karma fortunes by now.tilia ought to be getting tired of karma fortunes by now.tilia ought to be getting tired of karma fortunes by now.tilia ought to be getting tired of karma fortunes by now.tilia ought to be getting tired of karma fortunes by now.tilia ought to be getting tired of karma fortunes by now.tilia ought to be getting tired of karma fortunes by now.tilia ought to be getting tired of karma fortunes by now.tilia ought to be getting tired of karma fortunes by now.tilia ought to be getting tired of karma fortunes by now.
 
tilia's Avatar
 
Posts: 385
Karma: 1441403
Join Date: Mar 2011
Device: Kindle 3
Nice! A few strange result, but much better than file-size.

I counted the icons on my toolbar: 12 of 23 were for your plug-ins...

Thank You!
tilia is offline   Reply With Quote
Old 05-24-2011, 08:25 PM   #21
The Terminator
Retired
The Terminator ought to be getting tired of karma fortunes by now.The Terminator ought to be getting tired of karma fortunes by now.The Terminator ought to be getting tired of karma fortunes by now.The Terminator ought to be getting tired of karma fortunes by now.The Terminator ought to be getting tired of karma fortunes by now.The Terminator ought to be getting tired of karma fortunes by now.The Terminator ought to be getting tired of karma fortunes by now.The Terminator ought to be getting tired of karma fortunes by now.The Terminator ought to be getting tired of karma fortunes by now.The Terminator ought to be getting tired of karma fortunes by now.The Terminator ought to be getting tired of karma fortunes by now.
 
Posts: 2,552
Karma: 37638420
Join Date: Nov 2010
Location: Vancouver Island Canada
Device: Kobo Touch, Optimus One (2.3), Nexus 7 (4.2)
Love it! Great plug-in!
The Terminator is offline   Reply With Quote
Old 05-26-2011, 02:44 PM   #22
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
v1.0.3 Released

Changes in this release:
  • Offer choice of algorithms to match eBook viewer or APNX generation (default)
  • Ensure DRM encrypted books do not cause errors

I decided to give people a choice of algorithms - the default remains the same of as per the last release to use the APNX line count based one. However should you wish you can instead use the Adobe based one that the Calibre ebook viewer uses - the numbers will likely be a lot higher but then people who actually read their books using the Calibre viewer might prefer that.

I contemplated adding a third algorithm, based on stripping all the HTML comment. I tried a few variants, such as lines based or just dividing by a number of characters per page. I think it gave a little more consistent results but not significantly different in most cases from the Adobe based one. Of course you can change the number of characters per "page" to bring the numbers down closer to reality/the APNX based one, but it is all just fudging numbers so I ripped it out.

If someone comes up with an algorithm that they and others agree would offer more consistent results then it can always be added.
kiwidude is offline   Reply With Quote
Old 06-02-2011, 06:44 AM   #23
hakan42
Zealot
hakan42 is on a distinguished road
 
hakan42's Avatar
 
Posts: 136
Karma: 60
Join Date: Jul 2009
Location: Munich, Germany
Device: Nook Classic rooted; Galaxy S IV with Aldiko, other older devices
Excellent plugin, thank you.

One one hand, it happily runs in the background now, computing that useful information page size information for me

On the other hand, having the source of it available removes the necessity for some stupid questions I had while thinking about my bookshelf syncher plugin
hakan42 is offline   Reply With Quote
Old 06-02-2011, 12:13 PM   #24
CRussel
Wizard
CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.
 
CRussel's Avatar
 
Posts: 4,612
Karma: 30303932
Join Date: Jul 2010
Location: Sunshine Coast, BC
Device: Kindle PW, Fire HDX 8.9, Fire HD8.9, Fire 7HD, Surface 2
I have a book (originally .azw, converted to .mobi and DRM removed) that behaves weirdly. The default APNX algorithm counts it as 2 pages. The Adobe algorithm counts it as 1372 pages. (About twice what it should be. Hardcover is 624 pages, according to Amazon.) The book is: House Name by Michelle West.
CRussel is offline   Reply With Quote
Old 06-02-2011, 12:21 PM   #25
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@Crussel - see all my previous comments on "accuracy" - that the Adobe algorithm comes out at about twice the paperback size sounds "correct" from my own comparisons.

As to why ithe APNX calculation for a MOBI file would come out at 2 pages, my guess is that the book internally does not use <p> tags for paragraphs. Take a look at its content to confirm that is the case. If it is the MOBI format that it is counting the pages for then the issue lies with the APNX calculation code written by user_none, you could try posting on either the Kindle page count thread in the Devices forum or on his APNX plugin thread. He is aware that his calculation does not for instance consider <div> based paragraphs that some books use instead of <p> based ones.
kiwidude is offline   Reply With Quote
Old 06-02-2011, 06:59 PM   #26
CRussel
Wizard
CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.
 
CRussel's Avatar
 
Posts: 4,612
Karma: 30303932
Join Date: Jul 2010
Location: Sunshine Coast, BC
Device: Kindle PW, Fire HDX 8.9, Fire HD8.9, Fire 7HD, Surface 2
OK. Yes, I wasn't surprised that the Adobe algorithm was 2x -- that's been well discussed here and I expected it. But I was surprised by the number from the APNX calculation. I can certainly work around it, since it's an Amazon book and the page number is available. But was hoping to avoid this for most of my books. So far, it seems a limited number are having the issue.
CRussel is offline   Reply With Quote
Old 06-02-2011, 07:45 PM   #27
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
If you have looked at the APNX calculation as I have you will understand when it will go badly wrong - and that is as I said if the book uses <div> tags instead of <p> tags to define paragraphs.

As an experiment, convert your mobi to epub, make sure the plugin is setup to look at epub first (or create a new book record with just the epub on it), and then get the page count. I would be interested to hear the number you get. And as I said above you should look at the detail of the page to find out what your paragraph delimiters are.

The ePub calculation is based on the APNX one, but I added some extra tweaks to attempt to detect when that scenario I described above. Perhaps if you get a number that is more reasonable user_none might be convinced to change his apnx calculation to do something similar - or something even better, in which case I can steal it

There is still one other situation which I know both will fail on - and that is books which use <br/><br/> tags to define the end of a paragraph. i.e. no known non-closing tag around the paragraph. In that situation my epub calculation will go horribly wrong too. There is only so much we can do - if people/tools do non-standard things in formatting the books then downstream approximation hacks like this will get tripped up. If you fix the book in Sigil for instance and properly enclose the paragraphs with some regex find/replacing then you can run the page count afterwards.

Last edited by kiwidude; 06-02-2011 at 07:47 PM.
kiwidude is offline   Reply With Quote
Old 06-03-2011, 09:45 AM   #28
CRussel
Wizard
CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.
 
CRussel's Avatar
 
Posts: 4,612
Karma: 30303932
Join Date: Jul 2010
Location: Sunshine Coast, BC
Device: Kindle PW, Fire HDX 8.9, Fire HD8.9, Fire 7HD, Surface 2
OK, I took another one that came up that way (a MobileRead book by Josephine Tey that I had in mobi only), converted to ePub, and changed the preference to ePub first. Ran the page count again, and it went from a count of 1 to a count of 220, which feels about right.

It would be really helpful with this plugin if you could choose configuration at the point of counting or at least from the menu, without having to go into Preferences and change it.
CRussel is offline   Reply With Quote
Old 06-03-2011, 09:54 AM   #29
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,224
Karma: 1334002
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
Ok, so that sounds like what I thought then (since you haven't actually confirmed this at any point) - that your book has div tags instead of p tags. As I said above, go pester user_none into seeing if he wants to make a change to his apnx code

As for switching the option, I have tried to keep this like Extract ISBN as a plugin not requiring a menu. For myself, I keep only ePub and Mobi formats in my library. I only generate a page count on the books I have "cleaned up", which means I have both formats available. I have my preference set to ePub and have no need to ever change it since my ePub calculations are "more encompassing" than the Mobi based ones as described above.

If in your case you only keep mobi files, but want the odd ePub file in case you hit this problem again, just leave the setting at ePub. All it means is that it will look at ePub formats first if the book has one, and then fallback to mobi if it does not.

There isn't really a good reason I can think of to set it to mobi. The only remote possibility I allowed for was someone who was trying to get the same numbers appearing as they would get for the apnx files generated. As you have found that isn't always a good thing currently.
kiwidude is offline   Reply With Quote
Old 06-03-2011, 11:00 AM   #30
CRussel
Wizard
CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.CRussel ought to be getting tired of karma fortunes by now.
 
CRussel's Avatar
 
Posts: 4,612
Karma: 30303932
Join Date: Jul 2010
Location: Sunshine Coast, BC
Device: Kindle PW, Fire HDX 8.9, Fire HD8.9, Fire 7HD, Surface 2
Ah, good point, I hadn't thought it out fully. Yes, I'll set it to epub and leave it there, since I clean up all books as soon as I buy them. If the book has an obviously wrong page count, I'll just convert to epub and do it again. Now that I've got my full Calibre library with page counts, the process is hardly onerous. Almost all my books are mobi first, with the exception of the occasional public library book or Kobo book, so I don't keep a lot of ePub in my Calibre library, but it's not a big deal to add them as necessary.

Thanks for the plugin!
CRussel is offline   Reply With Quote
Reply

Tags
page count

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Quick Preferences kiwidude Plugins 32 09-13-2014 10:34 PM
[GUI Plugin] Quality Check kiwidude Plugins 780 09-12-2014 10:04 PM
[GUI Plugin] Kindle Collections (old) meme Plugins 2070 08-11-2014 12:02 AM
[GUI Plugin] Open With kiwidude Plugins 228 07-31-2014 01:06 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 04:20 AM.


MobileRead.com is a privately owned, operated and funded community.