MobileRead Forums - View Single Post - How does the calibre viewer calculate page number and total pages?

auspex · 09-23-2014, 03:36 PM

Quote:

Originally Posted by kovidgoyal

There is nothing that translates a bookmark into a page number. A page number is simply defined as

(number of pages of current html file * frac of file scrolled)/(total number of pages of all current html files)

If you are are asking how the viewer scrolls to a bookmark, look at cfi.coffee

Well, I wasn't asking how it scrolls; I was asking how you calculate the current page number, but I was afraid that was the answer. Which means I have to do it myself, as there's nothing I can call in calibre to do it. Still, it's not as awkward as I was thinking, as I can see the spine and the page counts in the iterator (though it's confusing that it's called an iterator, when it doesn't meet the python definition of an iterator...)

I guess it's fortunate that I lucked into a poorly formatted page on my first test. The calibre viewer and the Sony bookmark had similar pointers into this structure (.../2[heading_id_2]/4@4.9:0 and .../2/4:1, respectively)

Code:

<h1 class="part" id="heading_id_2">
  <a id="page10"/>
  <img alt="" src="../Images/Wint_9781594745775_epub_001_r1.jpg"/>
</h1>

Of course it makes no sense to have a self-closing anchor tag, and both the BeautifulSoup and BeautifulStoneSoup parsers parse it as:

Code:

<h1 class="part" id="heading_id_2">
  <a id="page10">
    <img alt="" src="../Images/Wint_9781594745775_epub_001_r1.jpg"/>
  </a>
</h1>

Its not going to make any real difference what result I get for this page but it demonstrates a very real problem, and I'm not sure how to work around it.

Quote:

And note that EPUB 3 CFI does not count text nodes. It makes no sense to count text nodes, since:

1) Text nodes can be normalized by the renderer
2) Offsets as numbers of characters in the terminal tag are recorded in the CFI in any case, making counting text nodes totally useless.

What the EPUB CFI spec does is assign odd numbered indices to represent the text between tags regardless of how many actual text nodes there are. So tags are always even numbered.

Right, I was (probably mis-)quoting from someone else's simplified explanation of the EPUB CFI but I really did know there could be more than one text node.