View Single Post
Old 09-23-2014, 03:36 PM   #9
auspex
Addict
auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.
 
auspex's Avatar
 
Posts: 201
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
Quote:
Originally Posted by kovidgoyal View Post
There is nothing that translates a bookmark into a page number. A page number is simply defined as

(number of pages of current html file * frac of file scrolled)/(total number of pages of all current html files)

If you are are asking how the viewer scrolls to a bookmark, look at cfi.coffee
Well, I wasn't asking how it scrolls; I was asking how you calculate the current page number, but I was afraid that was the answer. Which means I have to do it myself, as there's nothing I can call in calibre to do it. Still, it's not as awkward as I was thinking, as I can see the spine and the page counts in the iterator (though it's confusing that it's called an iterator, when it doesn't meet the python definition of an iterator...)

I guess it's fortunate that I lucked into a poorly formatted page on my first test. The calibre viewer and the Sony bookmark had similar pointers into this structure (.../2[heading_id_2]/4@4.9:0 and .../2/4:1, respectively)
Code:
<h1 class="part" id="heading_id_2">
  <a id="page10"/>
  <img alt="" src="../Images/Wint_9781594745775_epub_001_r1.jpg"/>
</h1>
Of course it makes no sense to have a self-closing anchor tag, and both the BeautifulSoup and BeautifulStoneSoup parsers parse it as:
Code:
<h1 class="part" id="heading_id_2">
  <a id="page10">
    <img alt="" src="../Images/Wint_9781594745775_epub_001_r1.jpg"/>
  </a>
</h1>
Its not going to make any real difference what result I get for this page but it demonstrates a very real problem, and I'm not sure how to work around it.

Quote:
And note that EPUB 3 CFI does not count text nodes. It makes no sense to count text nodes, since:

1) Text nodes can be normalized by the renderer
2) Offsets as numbers of characters in the terminal tag are recorded in the CFI in any case, making counting text nodes totally useless.

What the EPUB CFI spec does is assign odd numbered indices to represent the text between tags regardless of how many actual text nodes there are. So tags are always even numbered.
Right, I was (probably mis-)quoting from someone else's simplified explanation of the EPUB CFI but I really did know there could be more than one text node.
auspex is offline   Reply With Quote