Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 10-26-2011, 03:11 PM   #151
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,220
Karma: 1333994
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@Zetmolm - I don't know anything about the FB2 format. However unless there is some magical helper class in Calibre that exposes some HTML page content from it internally (like there is for MOBI), then the only way to get a page count would be from doing a conversion to EPUB. However this could be done internally by the plugin, so you don't actually "keep" the EPUB version in your library.

@Vaesse - good reminder. That one should be a little easier to do, I'll take a look.

@ElMiko - there's a few other points to think about. I think I have to support a fallback option (since short stories or books not on Goodreads etc will not otherwise get a page count). Which has two complications - firstly how to handle that from a UI/configuration perspective, and secondly whether people will want/care about the fact that some of their page counts will be paperbook based and others are ebook count based. They will not necessarily know (without peering at the log each time) which it is.

From a configuration perspective, I am going to drop the ePub/Mobi selection, and just use the "Preferred input format order". It was what I originally did with this plugin I think, and it is necessary if I am going to support counting from other formats.

I think rather than adding to the existing dropdown of page count algorithms, I will add a new one above it for "Retrieve from web" with options of "No website lookup", and "Goodreads" for now. I may add other websites like Amazon later, but Goodreads is one of the best for coverage, well for english books anyway!

So if a user picks "No website lookup", you get the behaviour you have today. If they choose "Goodreads" and no book match is found, then it will fallback to use the standard pagecount algorithm you have selected now.

From a wordcount perspective, nothing changes in that you cannot get that from a website. However I will consider adding some other formats - either directly like txt files, or indirectly by doing an internal conversion to ePub if there is no ePub format already like I mentioned for FB2 support above.

Finally people should bear in mind that getting from a website is not a guarantee that it will be any more "accurate" with the number they "had in mind". As I have said previously on this thread, there is no "one number" for a page count - large vs small print vs revised editions can result in quite a variance. The other risk is that the book it picks up is not the one you think - for instance if the plugin matches an omnibus edition. But it should get it close enough most of the time
kiwidude is offline   Reply With Quote
Old 10-27-2011, 06:42 AM   #152
Zetmolm
Evangelist
Zetmolm ought to be getting tired of karma fortunes by now.Zetmolm ought to be getting tired of karma fortunes by now.Zetmolm ought to be getting tired of karma fortunes by now.Zetmolm ought to be getting tired of karma fortunes by now.Zetmolm ought to be getting tired of karma fortunes by now.Zetmolm ought to be getting tired of karma fortunes by now.Zetmolm ought to be getting tired of karma fortunes by now.Zetmolm ought to be getting tired of karma fortunes by now.Zetmolm ought to be getting tired of karma fortunes by now.Zetmolm ought to be getting tired of karma fortunes by now.Zetmolm ought to be getting tired of karma fortunes by now.
 
Posts: 470
Karma: 893566
Join Date: Jan 2010
Device: Onyx Boox 60, PocketBook Touch
Kiwidude, thanks for your comments on my feature request. Internal conversion to ePub before doing the count would be a solution that certainly meets my needs. And wouldn't it also have the additional benefit of producing consistent word counts, because in the end only ePubs are counted? Thanks!
Zetmolm is offline   Reply With Quote
Old 11-06-2011, 12:08 AM   #153
Nyssa
Series Addict
Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.
 
Nyssa's Avatar
 
Posts: 5,386
Karma: 51840737
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle 3 (Wifi Only), Kindle Paperwhite
Hi Kiwidude.

I've been using the plugin since day one and have been using only word count for months now. I decided to add page count again and am getting weird numbers.

My files are all MOBI, being converted automatically from AWZ, etc. when added to Calibre.

The first book I tried has a 83,241 word count, but only a 3 pg count. I tried converting it from MOBI to MOBI, and now it says it has a 6 pg count. Any ideas?
Nyssa is offline   Reply With Quote
Old 11-06-2011, 01:31 AM   #154
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Quote:
Originally Posted by Nyssa View Post
Hi Kiwidude.

I've been using the plugin since day one and have been using only word count for months now. I decided to add page count again and am getting weird numbers.

My files are all MOBI, being converted automatically from AWZ, etc. when added to Calibre.

The first book I tried has a 83,241 word count, but only a 3 pg count. I tried converting it from MOBI to MOBI, and now it says it has a 6 pg count. Any ideas?
Did you try going beyond the first book? Some books just don't work well with the page count algorithms, as the concept of a 'page' in an ebook is a very strange thing in general. The odds are that under the hood the particular book you chose uses a markup syntax that isn't compatible with page count. The vast majority of books should give you better data.
ldolse is offline   Reply With Quote
Old 11-06-2011, 05:58 AM   #155
Nyssa
Series Addict
Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.
 
Nyssa's Avatar
 
Posts: 5,386
Karma: 51840737
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle 3 (Wifi Only), Kindle Paperwhite
Quote:
Originally Posted by ldolse View Post
Did you try going beyond the first book? Some books just don't work well with the page count algorithms, as the concept of a 'page' in an ebook is a very strange thing in general. The odds are that under the hood the particular book you chose uses a markup syntax that isn't compatible with page count. The vast majority of books should give you better data.
Yes, I did 20 of them as a test run. At least six of them look suspicious as either having too many pages or too few. I don't know if comparing them to the Kindle's page count would be helpful.
Nyssa is offline   Reply With Quote
Old 11-06-2011, 12:34 PM   #156
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,220
Karma: 1333994
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@Nyssa - PM me a link to a couple of those files so I can confirm what is going on. At a guess without seeing them I would say it is because of the following...

For counting pages in MOBI files, the plugin uses the Calibre code (the same code used when you send a MOBi file to your kindle and get it to generate an APNX file). This algorithm is based on counting the number of <p> paragraph tags.

However some books have the bulk of their content based on <div> tags instead. For ePub based page counting, I have an additional check which compares the number of div tags with the number of p tags - whichever is greater it uses as the basis for the count. This should give a more accurate count for div based books.

Note that logic only exists for epub books - for those files you are having problems with, try a conversion to ePub, make sure the plugin is configured to "prefer" the ePub version and see what count it comes up with.

Arguably the Calibre mobi page counting logic should be improved to try the same technique of comparing div with p tags and using whichever is bigger. However I leave that to user_none if he considers that desirable.

In a future version of this plugin as mentioned above I think I am going to *only* support counting pages based on the plugin doing a temporary conversion to ePub, so it uses my "tweaked" algorithms for estimations. This would also mean that a page count could be determined for books which you dont have/want to convert ePub or MOBi formats for, such as FB2 as mentioned above. However it would also mean that someone who only used MOBi format would not get the same page count showing in Calibre as they would on their Kindle. Whether that is really an "issue" is questionable, and instead the plugin would be useful to more users.
kiwidude is offline   Reply With Quote
Old 11-06-2011, 02:27 PM   #157
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,220
Karma: 1333994
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@Nyssa - thanks for the links.

I only looked at one of the books, but it confirms my theory above. In actual fact it shows a third combination not catered for - a book that uses <blockquote> tags as paragraph tags!

If you convert those books to ePub, you will find you get a better count. Calibre still has a <blockquote> in the ePub, but its conversion also puts <p> tags around the <br/> in between the blockquotes, so you end up with a roughly equivalent count. You can get a better count estimate with two regex replacements in the epub file - replacing the blockquote with p, then removing the pointless <p><br class="calibre1" /></p> entries. This allows the algorithm to handle the case of "very long paragraphs" to add to the page count, which are otherwise missed.

A tweak I could make to this plugin is to consider <blockquote> another permutation, and compare how many are found in the doc in the same way I do with <p> and <div> currently, which would also remove the need for the Sigil tweaking of the ePub conversion.

However no matter what I do about the above, your fundamental issues lie with user_none's implementation of the mobi page counting algorithm, because you don't store ePub versions in your library. So either you campaign to user_none to ask him to support books that are based on any of <p>, <div> or <blockquote> tags. Or this plugin gets changed to not use user_none's algorithm at all.
kiwidude is offline   Reply With Quote
Old 11-06-2011, 02:59 PM   #158
Nyssa
Series Addict
Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.
 
Nyssa's Avatar
 
Posts: 5,386
Karma: 51840737
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle 3 (Wifi Only), Kindle Paperwhite
Thank you for taking a look.

Edit: I would campaign if I had any idea as to what I was talking about.

I could conceivably have both MOBI and ePub versions in Calibre, but I have no idea what would do to its performance level and I would have no use for ePub files out side of corrected page numbers.

Last edited by Nyssa; 11-06-2011 at 03:03 PM.
Nyssa is offline   Reply With Quote
Old 11-06-2011, 03:06 PM   #159
Nyssa
Series Addict
Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.
 
Nyssa's Avatar
 
Posts: 5,386
Karma: 51840737
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle 3 (Wifi Only), Kindle Paperwhite
Would it make sense to convert all of the books to ePub, pull all of the word counts and page numbers, and then delete/remove all of the ePub versions from Calibre?
Nyssa is offline   Reply With Quote
Old 11-07-2011, 01:03 AM   #160
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,561
Karma: 12369681
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by Nyssa View Post
Would it make sense
Only you could answer if it would make sense. It certainly can't hurt anything. Try it on a few, if the results are close to what you are looking for then continue on to the rest.
DoctorOhh is offline   Reply With Quote
Old 11-07-2011, 03:38 AM   #161
atjnjk
Zealot
atjnjk will become famous soon enoughatjnjk will become famous soon enoughatjnjk will become famous soon enoughatjnjk will become famous soon enoughatjnjk will become famous soon enoughatjnjk will become famous soon enough
 
Posts: 105
Karma: 554
Join Date: Oct 2008
Device: none
After some testing (when this plugin was new), I found that Calibre Viewer (Adobe) algorithm is more consistent for mobi files than APNX. Although the page count is nowhere near the actual book (the trick is divide it by 2), it's still useful for comparing ebooks.
@kiwidude: IMHO, converting mobi to epub to count page is slower than the current method, so I really do hope that you could left a no conversion option. It's nice to know that more users (who is not using mobi or epub format) are going to enjoy this wonderful plugin though.
atjnjk is offline   Reply With Quote
Old 11-07-2011, 03:48 AM   #162
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 8,561
Karma: 12369681
Join Date: Feb 2009
Location: North Carolina
Device: Nexus 7
Quote:
Originally Posted by atjnjk View Post
After some testing (when this plugin was new), I found that Calibre Viewer (Adobe) algorithm is more consistent for mobi files than APNX. Although the page count is nowhere near the actual book (the trick is divide it by 2), it's still useful for comparing ebooks.
The calibre viewer and Adobe counts are quite different. The calibre viewer has a count almost twice that of the Adobe ePub renderer on any of my Sony devices. That said, all I care about is a page count that is consistent.
DoctorOhh is offline   Reply With Quote
Old 11-07-2011, 05:43 AM   #163
Nyssa
Series Addict
Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.Nyssa ought to be getting tired of karma fortunes by now.
 
Nyssa's Avatar
 
Posts: 5,386
Karma: 51840737
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle 3 (Wifi Only), Kindle Paperwhite
I went ahead, converted everything and ran the word & page counts. The numbers make a lot more sense now. I have not deleted the ePub versions, however. If calibre slows down significantly then I'll reconsider. I'm not too happy at the thought of having to manually convert every book I add to my library, though. I don't know if I can get calibre to perform two conversions at once - to Mobi and then another to ePub.
Nyssa is offline   Reply With Quote
Old 11-07-2011, 08:03 AM   #164
unboggling
by the bootstraps
unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.unboggling ought to be getting tired of karma fortunes by now.
 
Posts: 1,051
Karma: 858115
Join Date: Jan 2011
Location: Southeast US
Device: PRS-T2, Nexus 7, KindleT, iPad1, Kindle3KB
Quote:
Originally Posted by Nyssa View Post
I went ahead, converted everything and ran the word & page counts. The numbers make a lot more sense now. I have not deleted the ePub versions, however. If calibre slows down significantly then I'll reconsider. I'm not too happy at the thought of having to manually convert every book I add to my library, though. I don't know if I can get calibre to perform two conversions at once - to Mobi and then another to ePub.
I keep only EPUBs rather than MOBIs even though I read mostly on Kindle. It's easy for calibre to automatically convert on the fly from EPUB to MOBI just before loading the device.
unboggling is offline   Reply With Quote
Old 11-07-2011, 04:45 PM   #165
kiwidude
calibre/Sigil Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,220
Karma: 1333994
Join Date: Oct 2010
Location: London, UK
Device: Kindle Paperwhite 3G, iPad 3, iPad Air
@Nyssa - as I have said above there is very little I can do, beyond not using user_none's code for MOBI files. And even if I go down that route, you will still have the problem of when you send the book to your Kindle that the page count from the generated apnx file is going to be the same wildly inaccurate values for those books.

This has all been discussed previously on this thread (start looking at page 2). IIRC user_none was reluctant to change the existing logic because his focus was on performance, he did not want to significantly slow down the sending to device process.

The only resolution I can think of that "might" keep everyone happy, is offering an option in the APNX file generation to pull the value from a custom column, rather than always generating using its (clearly drastically flawed in some cases) current counting implementation. That way this plugin can use whatever approach it wants to (such as pulling from a website, with a fallback to counting based on the max of <div>,<p> or <blockquote> tags). And then when you send the book to your Kindle, this "better" page count value could be used on your Kindle.

How much work that would be to implement if it is agreed and whether user_none has an interest or is willing to have the existing code changed is quote another issue, it would impact his standalone apnx file generator plugin as well.
kiwidude is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] Kindle Collections meme Plugins 2011 04-08-2014 10:38 AM
[GUI Plugin] Open With kiwidude Plugins 216 03-18-2014 02:23 AM
[GUI Plugin] Quality Check kiwidude Plugins 724 03-15-2014 04:40 PM
[GUI Plugin] Quick Preferences kiwidude Plugins 25 07-31-2013 01:59 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 03:36 PM.


MobileRead.com is a privately owned, operated and funded community.