Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Development

Notices

Reply
 
Thread Tools Search this Thread
Old 09-15-2014, 09:48 PM   #1
auspex
Groupie
auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.
 
auspex's Avatar
 
Posts: 199
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
How does the calibre viewer calculate page number and total pages?

I'm working on a port of davidfor's Kobo Utilities to Sony, and trying to find a reasonable way to find my position in a book that I'm currently reading. Sony doesn't make it easy. If you downloaded a book from Sony's store, or since they sold out, from Kobo, they maintain a table that gives amongst other things "percent read", but if your book is sideloaded, Sony appears to calculate the number of pages and your current page number on the fly, and it's never saved in its database (and of course they don't tell US how they do it).

So, I'm trying to figure out how the calibre viewer calculates these numbers, and can't find the code anywhere.
auspex is offline   Reply With Quote
Old 09-16-2014, 12:00 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
iterator/book.py
kovidgoyal is online now   Reply With Quote
Advert
Old 09-16-2014, 12:35 AM   #3
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
For epubs on the Sony devices, the number of pages will be calculated by the Adobe RMSDK. And the current page will be based on that. The description of the method is in the Wiki, but have a look the Count Pages for an implementation.

To calculate a percent read, what Kovid pointed to should work. You will also need the current position from the database on the Sony. From memory, this is stored in an Adobe specific way. I assume it comes from the RMSDK as the Kobo's use it for epubs as well. The calibre viewer uses a different position method (the same as for epub3?). I don't know if there is already a way to translate between them, but it shouldn't be to hard*.

From memory, iterator/book.py has to unpack the book to work. That means calculating the percent read could take some time. For one book, it shouldn't be to bad, but if you are doing it for all the books on the device, it might take a while. I suppose that should only happen once when the store positions is first run.

* Imagine me laughing maniacally while I typed that.
davidfor is offline   Reply With Quote
Old 09-16-2014, 09:10 AM   #4
auspex
Groupie
auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.
 
auspex's Avatar
 
Posts: 199
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
As far as I can tell, the only thing that Sony stores for position of sideloaded books is the bookmark. Which is like an EPUB3 CFI, but not identical (nor is calibre's), but is easily (so much for maniacal laughter!) translated to the calibre format (they're closer to each other than to EPUB3). FWIW, EPUB3 counts nodes (text nodes + tags) while calibre/Sony seem to count only tags, with the significant difference that Sony's CFIs don't count the <HEAD> tag.

So, in any case, it's going to have to open the book to calculate the position.

Thanks for the answers. Now, off to try some more stuff!
auspex is offline   Reply With Quote
Old 09-16-2014, 09:55 AM   #5
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,907
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
Hmm, you're right, it is easy. It's been a while since I compared the two methods. And not counting the head tag has always bugged me when I've looked at this.

With that, it would be easy to put the reading position or bookmarks into the epub for the viewer.
davidfor is offline   Reply With Quote
Advert
Old 09-16-2014, 11:31 AM   #6
auspex
Groupie
auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.
 
auspex's Avatar
 
Posts: 199
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
That's what I was thinking.
auspex is offline   Reply With Quote
Old 09-16-2014, 10:06 PM   #7
auspex
Groupie
auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.
 
auspex's Avatar
 
Posts: 199
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
iterator/book.py calculates the total number of pages. I'm still not seeing anything that translates a bookmark into a current page number.
auspex is offline   Reply With Quote
Old 09-16-2014, 11:21 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
There is nothing that translates a bookmark into a page number. A page number is simply defined as

(number of pages of current html file * frac of file scrolled)/(total number of pages of all current html files)

If you are are asking how the viewer scrolls to a bookmark, look at cfi.coffee

And note that EPUB 3 CFI does not count text nodes. It makes no sense to count text nodes, since:

1) Text nodes can be normalized by the renderer
2) Offsets as numbers of characters in the terminal tag are recorded in the CFI in any case, making counting text nodes totally useless.

What the EPUB CFI spec does is assign odd numbered indices to represent the text between tags regardless of how many actual text nodes there are. So tags are always even numbered.
kovidgoyal is online now   Reply With Quote
Old 09-23-2014, 03:36 PM   #9
auspex
Groupie
auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.auspex ought to be getting tired of karma fortunes by now.
 
auspex's Avatar
 
Posts: 199
Karma: 1071756
Join Date: Sep 2012
Location: Nova Scotia
Device: Kobo Aura, Nexus 5x
Quote:
Originally Posted by kovidgoyal View Post
There is nothing that translates a bookmark into a page number. A page number is simply defined as

(number of pages of current html file * frac of file scrolled)/(total number of pages of all current html files)

If you are are asking how the viewer scrolls to a bookmark, look at cfi.coffee
Well, I wasn't asking how it scrolls; I was asking how you calculate the current page number, but I was afraid that was the answer. Which means I have to do it myself, as there's nothing I can call in calibre to do it. Still, it's not as awkward as I was thinking, as I can see the spine and the page counts in the iterator (though it's confusing that it's called an iterator, when it doesn't meet the python definition of an iterator...)

I guess it's fortunate that I lucked into a poorly formatted page on my first test. The calibre viewer and the Sony bookmark had similar pointers into this structure (.../2[heading_id_2]/4@4.9:0 and .../2/4:1, respectively)
Code:
<h1 class="part" id="heading_id_2">
  <a id="page10"/>
  <img alt="" src="../Images/Wint_9781594745775_epub_001_r1.jpg"/>
</h1>
Of course it makes no sense to have a self-closing anchor tag, and both the BeautifulSoup and BeautifulStoneSoup parsers parse it as:
Code:
<h1 class="part" id="heading_id_2">
  <a id="page10">
    <img alt="" src="../Images/Wint_9781594745775_epub_001_r1.jpg"/>
  </a>
</h1>
Its not going to make any real difference what result I get for this page but it demonstrates a very real problem, and I'm not sure how to work around it.

Quote:
And note that EPUB 3 CFI does not count text nodes. It makes no sense to count text nodes, since:

1) Text nodes can be normalized by the renderer
2) Offsets as numbers of characters in the terminal tag are recorded in the CFI in any case, making counting text nodes totally useless.

What the EPUB CFI spec does is assign odd numbered indices to represent the text between tags regardless of how many actual text nodes there are. So tags are always even numbered.
Right, I was (probably mis-)quoting from someone else's simplified explanation of the EPUB CFI but I really did know there could be more than one text node.
auspex is offline   Reply With Quote
Old 09-24-2014, 08:10 AM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You should not use BeautifulSoup to parse. The parsing strategy to follow would be:

1) Try to parse as XML, implementing various simple corrections so that only slightly invalid documents still parse.
2) If (1) fails, parse as HTML 5
3) If (2) fails parse as HTML 4 and/or use BeautifulSoup

See parse_utils.py in the calibre source code.

Of course, the correct solution is to use the exact parsing algorithm used by the software that generated the CFI, since that is no practical, IMO the above cascade will likely give yo the best results, with perhaps a few modifications to handle common cases.
kovidgoyal is online now   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Aura HD Total number of book page n3xtITA Kobo Reader 26 12-23-2013 06:58 AM
Total number of pages xaim Marvin 5 11-17-2013 09:59 AM
Show Total Number of Books in Calibre Library Canadian reader Library Management 8 08-29-2013 11:29 PM
Does Kobo display total number of pages? foghat Kobo Reader 24 06-12-2010 01:10 AM
How are the page numbers/number of pages defined? kennyc ePub 8 09-27-2009 11:23 AM


All times are GMT -4. The time now is 05:41 AM.


MobileRead.com is a privately owned, operated and funded community.