View Single Post
Old 02-02-2014, 04:06 PM   #1
rpspringuel
Enthusiast
rpspringuel began at the beginning.
 
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
Real Page Numbers

I've been exploring the apnx generator and really like how I can now get page numbers in my Kindle instead of just location numbers.

However, while the estimated page numbers are fine most of the time, as an academic its sometimes important that I know exactly which page I'm on when constructing a citation reference. Obviously, to do this would require some manual editing of the ebook to mark where pages start. That's obviously a lot of work, but I only need to do it for a limited number of books so I consider it a reasonable trade off in some circumstances.

To that end, I'd like some feedback on how to make this work. My thoughts are thus:
  1. Use a tag to mark pages. This tag should be unique and unlikely to appear within a book normally. It also should not print anything to the screen of the reader so as to not interfere with the reading of the book. Normally this would lead me to use a special comment like "<!--page-->", but it appears that comments are are not retained in an edited azw3 book. Is there anyone familiar enough with the azw3 format to know what sort of tag could be used to fulfill this requirement?
  2. In apnx.py define a new function, get_pages_real, which scans the text like get_pages_accurate does, except instead of trying to count lines and marking a page every 30 lines, it simply marks a page when it encounters the above mentioned tag.
  3. Modify write_apnx so that the parameter "accurate" isn't boolean, but rather accepts three options: real, accurate, fast. If real is called for and fails due to there being no page markers in the text, the algorithm should spit out a warning and then try accurate. If it fails due to DRM, it should spit out a warning and then try fast (as the algorithm currently does for accurate).

On a related note, does anyone know how the apnx files handle pages in the front matter which are numbered with roman numerals and then the page count resetting when the main matter of the book starts?

I should note that I can program in python, and thus could make the necessary code modifications myself to apnx.py. However, I don't know how to integrate those changes into the user interface of calibre. My coding work has all been for people who can read and manipulate source code. I've never worried about a user interface before (beyond simple raw_input/input prompts). Thus while I'm perfectly willing to do the under the hood work I'll need some help getting it integrated.
rpspringuel is offline   Reply With Quote