@Nyssa - thanks for the links.
I only looked at one of the books, but it confirms my theory above. In actual fact it shows a third combination not catered for - a book that uses <blockquote> tags as paragraph tags!
If you convert those books to ePub, you will find you get a better count. Calibre still has a <blockquote> in the ePub, but its conversion also puts <p> tags around the <br/> in between the blockquotes, so you end up with a roughly equivalent count. You can get a better count estimate with two regex replacements in the epub file - replacing the blockquote with p, then removing the pointless <p><br class="calibre1" /></p> entries. This allows the algorithm to handle the case of "very long paragraphs" to add to the page count, which are otherwise missed.
A tweak I could make to this plugin is to consider <blockquote> another permutation, and compare how many are found in the doc in the same way I do with <p> and <div> currently, which would also remove the need for the Sigil tweaking of the ePub conversion.
However no matter what I do about the above, your fundamental issues lie with user_none's implementation of the mobi page counting algorithm, because you don't store ePub versions in your library. So either you campaign to user_none to ask him to support books that are based on any of <p>, <div> or <blockquote> tags. Or this plugin gets changed to not use user_none's algorithm at all.
|