10-26-2011, 03:11 PM | #151 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Zetmolm - I don't know anything about the FB2 format. However unless there is some magical helper class in Calibre that exposes some HTML page content from it internally (like there is for MOBI), then the only way to get a page count would be from doing a conversion to EPUB. However this could be done internally by the plugin, so you don't actually "keep" the EPUB version in your library.
@Vaesse - good reminder. That one should be a little easier to do, I'll take a look. @ElMiko - there's a few other points to think about. I think I have to support a fallback option (since short stories or books not on Goodreads etc will not otherwise get a page count). Which has two complications - firstly how to handle that from a UI/configuration perspective, and secondly whether people will want/care about the fact that some of their page counts will be paperbook based and others are ebook count based. They will not necessarily know (without peering at the log each time) which it is. From a configuration perspective, I am going to drop the ePub/Mobi selection, and just use the "Preferred input format order". It was what I originally did with this plugin I think, and it is necessary if I am going to support counting from other formats. I think rather than adding to the existing dropdown of page count algorithms, I will add a new one above it for "Retrieve from web" with options of "No website lookup", and "Goodreads" for now. I may add other websites like Amazon later, but Goodreads is one of the best for coverage, well for english books anyway! So if a user picks "No website lookup", you get the behaviour you have today. If they choose "Goodreads" and no book match is found, then it will fallback to use the standard pagecount algorithm you have selected now. From a wordcount perspective, nothing changes in that you cannot get that from a website. However I will consider adding some other formats - either directly like txt files, or indirectly by doing an internal conversion to ePub if there is no ePub format already like I mentioned for FB2 support above. Finally people should bear in mind that getting from a website is not a guarantee that it will be any more "accurate" with the number they "had in mind". As I have said previously on this thread, there is no "one number" for a page count - large vs small print vs revised editions can result in quite a variance. The other risk is that the book it picks up is not the one you think - for instance if the plugin matches an omnibus edition. But it should get it close enough most of the time |
10-27-2011, 06:42 AM | #152 |
Guru
Posts: 612
Karma: 2031728
Join Date: Jan 2010
Device: PocketBook Touch (622), PocketBook Touch Lux 2, Pocketbook Touch HD 3
|
Kiwidude, thanks for your comments on my feature request. Internal conversion to ePub before doing the count would be a solution that certainly meets my needs. And wouldn't it also have the additional benefit of producing consistent word counts, because in the end only ePubs are counted? Thanks!
|
11-06-2011, 12:08 AM | #153 |
Series Addict
Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Hi Kiwidude.
I've been using the plugin since day one and have been using only word count for months now. I decided to add page count again and am getting weird numbers. My files are all MOBI, being converted automatically from AWZ, etc. when added to Calibre. The first book I tried has a 83,241 word count, but only a 3 pg count. I tried converting it from MOBI to MOBI, and now it says it has a 6 pg count. Any ideas? |
11-06-2011, 01:31 AM | #154 | |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Quote:
|
|
11-06-2011, 05:58 AM | #155 | |
Series Addict
Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Quote:
|
|
11-06-2011, 12:34 PM | #156 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Nyssa - PM me a link to a couple of those files so I can confirm what is going on. At a guess without seeing them I would say it is because of the following...
For counting pages in MOBI files, the plugin uses the Calibre code (the same code used when you send a MOBi file to your kindle and get it to generate an APNX file). This algorithm is based on counting the number of <p> paragraph tags. However some books have the bulk of their content based on <div> tags instead. For ePub based page counting, I have an additional check which compares the number of div tags with the number of p tags - whichever is greater it uses as the basis for the count. This should give a more accurate count for div based books. Note that logic only exists for epub books - for those files you are having problems with, try a conversion to ePub, make sure the plugin is configured to "prefer" the ePub version and see what count it comes up with. Arguably the Calibre mobi page counting logic should be improved to try the same technique of comparing div with p tags and using whichever is bigger. However I leave that to user_none if he considers that desirable. In a future version of this plugin as mentioned above I think I am going to *only* support counting pages based on the plugin doing a temporary conversion to ePub, so it uses my "tweaked" algorithms for estimations. This would also mean that a page count could be determined for books which you dont have/want to convert ePub or MOBi formats for, such as FB2 as mentioned above. However it would also mean that someone who only used MOBi format would not get the same page count showing in Calibre as they would on their Kindle. Whether that is really an "issue" is questionable, and instead the plugin would be useful to more users. |
11-06-2011, 02:27 PM | #157 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Nyssa - thanks for the links.
I only looked at one of the books, but it confirms my theory above. In actual fact it shows a third combination not catered for - a book that uses <blockquote> tags as paragraph tags! If you convert those books to ePub, you will find you get a better count. Calibre still has a <blockquote> in the ePub, but its conversion also puts <p> tags around the <br/> in between the blockquotes, so you end up with a roughly equivalent count. You can get a better count estimate with two regex replacements in the epub file - replacing the blockquote with p, then removing the pointless <p><br class="calibre1" /></p> entries. This allows the algorithm to handle the case of "very long paragraphs" to add to the page count, which are otherwise missed. A tweak I could make to this plugin is to consider <blockquote> another permutation, and compare how many are found in the doc in the same way I do with <p> and <div> currently, which would also remove the need for the Sigil tweaking of the ePub conversion. However no matter what I do about the above, your fundamental issues lie with user_none's implementation of the mobi page counting algorithm, because you don't store ePub versions in your library. So either you campaign to user_none to ask him to support books that are based on any of <p>, <div> or <blockquote> tags. Or this plugin gets changed to not use user_none's algorithm at all. |
11-06-2011, 02:59 PM | #158 |
Series Addict
Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Thank you for taking a look.
Edit: I would campaign if I had any idea as to what I was talking about. I could conceivably have both MOBI and ePub versions in Calibre, but I have no idea what would do to its performance level and I would have no use for ePub files out side of corrected page numbers. Last edited by Nyssa; 11-06-2011 at 03:03 PM. |
11-06-2011, 03:06 PM | #159 |
Series Addict
Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
Would it make sense to convert all of the books to ePub, pull all of the word counts and page numbers, and then delete/remove all of the ePub versions from Calibre?
|
11-07-2011, 01:03 AM | #160 |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
|
11-07-2011, 03:38 AM | #161 |
Zealot
Posts: 107
Karma: 554
Join Date: Oct 2008
Device: none
|
After some testing (when this plugin was new), I found that Calibre Viewer (Adobe) algorithm is more consistent for mobi files than APNX. Although the page count is nowhere near the actual book (the trick is divide it by 2), it's still useful for comparing ebooks.
@kiwidude: IMHO, converting mobi to epub to count page is slower than the current method, so I really do hope that you could left a no conversion option. It's nice to know that more users (who is not using mobi or epub format) are going to enjoy this wonderful plugin though. |
11-07-2011, 03:48 AM | #162 |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
The calibre viewer and Adobe counts are quite different. The calibre viewer has a count almost twice that of the Adobe ePub renderer on any of my Sony devices. That said, all I care about is a page count that is consistent.
|
11-07-2011, 05:43 AM | #163 |
Series Addict
Posts: 6,180
Karma: 167189477
Join Date: Dec 2010
Location: Florida, USA
Device: Kindle Paperwhite (2nd Gen)
|
I went ahead, converted everything and ran the word & page counts. The numbers make a lot more sense now. I have not deleted the ePub versions, however. If calibre slows down significantly then I'll reconsider. I'm not too happy at the thought of having to manually convert every book I add to my library, though. I don't know if I can get calibre to perform two conversions at once - to Mobi and then another to ePub.
|
11-07-2011, 08:03 AM | #164 | |
Wizard
Posts: 1,065
Karma: 858115
Join Date: Jan 2011
Device: Kobo Clara, Kindle Paperwhite 10
|
Quote:
|
|
11-07-2011, 04:45 PM | #165 |
Calibre Plugins Developer
Posts: 4,636
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
|
@Nyssa - as I have said above there is very little I can do, beyond not using user_none's code for MOBI files. And even if I go down that route, you will still have the problem of when you send the book to your Kindle that the page count from the generated apnx file is going to be the same wildly inaccurate values for those books.
This has all been discussed previously on this thread (start looking at page 2). IIRC user_none was reluctant to change the existing logic because his focus was on performance, he did not want to significantly slow down the sending to device process. The only resolution I can think of that "might" keep everyone happy, is offering an option in the APNX file generation to pull the value from a custom column, rather than always generating using its (clearly drastically flawed in some cases) current counting implementation. That way this plugin can use whatever approach it wants to (such as pulling from a website, with a fallback to counting based on the max of <div>,<p> or <blockquote> tags). And then when you send the book to your Kindle, this "better" page count value could be used on your Kindle. How much work that would be to implement if it is agreed and whether user_none has an interest or is willing to have the existing code changed is quote another issue, it would impact his standalone apnx file generator plugin as well. |
Tags |
count, count pages, page count, pages, plugin |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] Quality Check | kiwidude | Plugins | 1184 | 04-17-2024 06:17 PM |
[GUI Plugin] Open With | kiwidude | Plugins | 403 | 04-01-2024 08:39 AM |
[GUI Plugin] Quick Preferences | kiwidude | Plugins | 62 | 03-16-2024 11:47 PM |
[GUI Plugin] Kindle Collections (old) | meme | Plugins | 2070 | 08-11-2014 12:02 AM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |