View Full Version : What ePub page/position calculation methods are there?


mattcurtis
05-27-2013, 02:25 PM
Hey guys,

I'm building an ePub reader, and I need to calculate and display the user's position inside of that ePub. However, ePub section/chapter size/word per page and and styling vary, which makes this a bit more difficult than I'd like.

I'm using a web-based solution, and since my implementation is page-oriented, I know what page a user is in, but only inside that section/chapter. Displaying all sections and calculating full number of pages and current page constantly uses too much memory. I was considering simply going with "Chapter X, Page X/X", but then this made me consider what other methods have been employed, or are recommended for letting the user know where they are in a book.

Thanks for any advice!

DaleDe
05-28-2013, 01:26 PM
Adobe Digital Editions is the most popular reader for ePub. They use 1024 readable characters to determine a page size so no matter what size screen or font size there will be a one to one correspondence to the page number. You can read about this in our wiki and other issues with page numbers.

Dale

Jellby
05-29-2013, 03:03 AM
Adobe Digital Editions is the most popular reader for ePub. They use 1024 readable characters to determine a page size so no matter what size screen or font size there will be a one to one correspondence to the page number.

Not exactly that way. If I remember correctly, it's one page per 1024 bytes in the compressed file (so changing the compression level does change the number of pages), and then the pages are more or less evenly distributed by displayed characters (or maybe just characters, so comments and HTML code would count).

Ah, the relevant information is in the wiki: http://wiki.mobileread.com/wiki/Adobe_Digital_Editions#Page_numbers

I'm building an ePub reader, and I need to calculate and display the user's position inside of that ePub. However, ePub section/chapter size/word per page and and styling vary, which makes this a bit more difficult than I'd like

Well, why not use a percent position? Beware of using the word "chapter", as not all books are split at chapter level, and even when they are the numbering could not match the actual chapter numbering, and that would be confusing.

mattcurtis
05-29-2013, 01:27 PM
Not exactly that way. If I remember correctly, it's one page per 1024 bytes in the compressed file (so changing the compression level does change the number of pages), and then the pages are more or less evenly distributed by displayed characters (or maybe just characters, so comments and HTML code would count).

Ah, the relevant information is in the wiki: http://wiki.mobileread.com/wiki/Adobe_Digital_Editions#Page_numbers



Well, why not use a percent position? Beware of using the word "chapter", as not all books are split at chapter level, and even when they are the numbering could not match the actual chapter numbering, and that would be confusing.

:thanks: I've decided to go with percents.

JSWolf
05-30-2013, 02:54 PM
Well, why not use a percent position? Beware of using the word "chapter", as not all books are split at chapter level, and even when they are the numbering could not match the actual chapter numbering, and that would be confusing.

Discworld books do not have chapters. So in the case of all of those books, using the term chapter would be incorrect. There also no X/X until the next chapter so that doesn't work as well.

JSWolf
05-30-2013, 02:55 PM
:thanks: I've decided to go with percents.

% alone is not OK. I'd prefer ADE style page numbers. Both would do very well.

mattcurtis
05-31-2013, 01:25 AM
% alone is not OK. I'd prefer ADE style page numbers. Both would do very well.

I'm using a sort of compromise: inside of a section, you see both where you are inside of its pages, and your overall percent inside of the book. Similar to Stanza.

Exact ADE style is a bit harder for me to implement at this point, but it's something I have planned out for the future.

JSWolf
05-31-2013, 01:37 AM
Take a look at the count pages plugin for Calibre. It has the code to do the page count the ADE way and you can then implement it in your program from the start instead of in some update. Best do it that way then similar to Stanza (lousy program).

Jellby
05-31-2013, 03:09 AM
% alone is not OK. I'd prefer ADE style page numbers. Both would do very well.

Yes, "pages" would be fine to get an idea of the total length of the book. And it's very easy to calculate the total number of ADE-style pages, so one could have "95.00% of 20 pages" or "1.00% of 1900 pages" (both are "page 19"). Better yet, don't use the word "page", as it's misleading, maybe something like "block".

And having the relative position inside the current "chapter" is also a very useful feature, at least for the books that have chapters.

mattcurtis
05-31-2013, 05:09 PM
Take a look at the count pages plugin for Calibre. It has the code to do the page count the ADE way and you can then implement it in your program from the start instead of in some update. Best do it that way then similar to Stanza (lousy program).

I wish it was written in something more accessible than Python :(

JSWolf
05-31-2013, 05:49 PM
I wish it was written in something more accessible than Python :(

You could ask the author for help with how its done to get the same page number as ADE.

mattcurtis
05-31-2013, 05:50 PM
And part of my problem is that my reader will allow for extensive CSS customization (well, at least above the norm), and every time the user changes those CSS settings I will have to recalculate the number of pages, and ADE probably won't cut it if it's going to be accurate. I'll have to experiment with how long it takes to get full number of pages in large ePubs, see if that's feasible :/

PeterT
05-31-2013, 06:38 PM
Actually I wold have thought that with the ADE style of page numbering, no degree of changes to the CSS would affect the page number. After all that is the whole idea behind the synthetic numbering. This is as opposed to the page numbers in (for instance) the Kobo ePub variant which reflects the actual screen count of pages.

mattcurtis
05-31-2013, 06:53 PM
Actually I wold have thought that with the ADE style of page numbering, no degree of changes to the CSS would affect the page number. After all that is the whole idea behind the synthetic numbering. This is as opposed to the page numbers in (for instance) the Kobo ePub variant which reflects the actual screen count of pages.

Your saying that makes me think I don't understand ADE paging fully.

Back to researching.

PeterT
05-31-2013, 07:05 PM
ADE page numbers are constant, regardless of things like font sizes / screen size etc. Roughly speaking, as others have said, they are based on a 1024 character chunk of the text; each 1k is a page.

mattcurtis
05-31-2013, 07:28 PM
ADE page numbers are constant, regardless of things like font sizes / screen size etc. Roughly speaking, as others have said, they are based on a 1024 character chunk of the text; each 1k is a page.

Here's what confuses me about that: how does that account for paragraph margin, font-size, etc? I mean, if the font size is large enough, how will the pages number ever map correctly to what the user is seeing? Or do page numbers account for swathes of text? I can't seem to find information that explains it thoroughly, or at least has code that implements it.

JSWolf
05-31-2013, 07:37 PM
Here's what confuses me about that: how does that account for paragraph margin, font-size, etc? I mean, if the font size is large enough, how will the pages number ever map correctly to what the user is seeing? Or do page numbers account for swathes of text? I can't seem to find information that explains it thoroughly, or at least has code that implements it.

I've already told you that the Count Pages Calibre plug-in has the code to do the ADE page count. You groaned that it's in Python. Did you forget that fast? You can look there and see how it's calculated.

ADE's method of pages uses a count of the text in the XML files. It does not matter what the CSS tells the XML how to display or what the user has set the font size to. The XML is a set file. Nothing the CSS does changes that. How it displays does not change the size of the file.

mattcurtis
05-31-2013, 08:27 PM
I've already told you that the Count Pages Calibre plug-in has the code to do the ADE page count. You groaned that it's in Python. Did you forget that fast? You can look there and see how it's calculated.

ADE's method of pages uses a count of the text in the XML files. It does not matter what the CSS tells the XML how to display or what the user has set the font size to. The XML is a set file. Nothing the CSS does changes that. How it displays does not change the size of the file.

I'm still looking at its code, actually - it's been helpful, even though my Python sucks. I was asking for more information about the methodology so I can understand how to translate that into code that code into Obj-C/JavaScript (what I'm working with).

Secondly, if I'm not working with a compressed ePub, but an extracted one, does the same 1024 bytes apply?

mattcurtis
05-31-2013, 09:28 PM
Finally found something that explains this in detail: http://bookclubs.barnesandnoble.com/t5/NOOK-Book-Discussion/EPUBs-and-page-numbering/td-p/691602

I think it's starting to map itself together in my head (my code, at least) I'm not as confused anymore. Thanks for the help.

JSWolf
05-31-2013, 09:41 PM
Secondly, if I'm not working with a compressed ePub, but an extracted one, does the same 1024 bytes apply?

There is no such thing as an extracted ePub. ePub is a ZIP container. So you work with the ePub as it is for the page count. You don't count things like images, CSS, OPF, etc. You only count the files in the spine in the OPF. You count the compressed bytes to get 1024 and then whatever that becomes when uncompressed to is your page.

Jellby
06-01-2013, 03:06 AM
Actually I wold have thought that with the ADE style of page numbering, no degree of changes to the CSS would affect the page number. After all that is the whole idea behind the synthetic numbering.

Not if the CSS is in a separate file, but every change in the HTML files can change the page numbering, including simply adding spaces or comments in the code. Even changing the zip compression level does change the page numbering!

Jellby
06-01-2013, 03:12 AM
Finally found something that explains this in detail: http://bookclubs.barnesandnoble.com/t5/NOOK-Book-Discussion/EPUBs-and-page-numbering/td-p/691602.

There is a slight disagreement there. That page says:

"The first page will be 1212 characters long, and the second will be 1213 long."

While our wiki (http://wiki.mobileread.com/wiki/ADE#Page_numbers) (which was copied from what Adobe once had online, I believe), says:

"... round the number of characters per page up and let the last “page” contain less characters than the rest."
which would mean that the first page is 1213 characters and the second 1212.

mattcurtis
06-01-2013, 10:48 AM
There is a slight disagreement there. That page says:

"The first page will be 1212 characters long, and the second will be 1213 long."

While our wiki (http://wiki.mobileread.com/wiki/ADE#Page_numbers) (which was copied from what Adobe once had online, I believe), says:

"... round the number of characters per page up and let the last “page” contain less characters than the rest."
which would mean that the first page is 1212 characters and the second 1212.

That's what I thought, thanks. I'm not sure why my puddy-brain wasn't getting ADE pages.