View Single Post
Old 05-08-2014, 05:50 PM   #14
rpspringuel
Enthusiast
rpspringuel began at the beginning.
 
Posts: 40
Karma: 10
Join Date: Feb 2014
Device: Kindle 4
Okay, so I finally got around to doing something here and have come up with something that appears to work for me.

What I ended up doing was as follows:
  1. Instead of writing a new function (and thus having to figure out how to get calibre to call it) I hijacked the get_pages_exact function that already existed in apnx.py. I did not eliminate the code that was already there, but rather modified it so that if the incoming page_count is negative, my new code is run. If the page_count was positive, the original code is run. I'm open to changing this, but would need help figuring out how to tell calibre what the new function is and under what circumstances to use it.
  2. My code scan the file looking for tags which contain "pagebreak". I chose to only look for that string because it was common to both the ePub standard I was thinking of using above and with the mobi tag "<mbp:pagebreak/>"* which is used to force manual page breaks in mobi files. However, since I look at every tag and only look for "pagebreak" and not anything else, it's also possible to use things like 'data-pagebreak="2"' in an existing tag to mark a page break (indeed, that's what I ended up doing for my test book).
  3. While I was originally going to try to differentiate between page numbering sequences (like i, ii, iii, etc. for front matter and 1, 2, 3, etc. for main matter) I determined that doing so would require a more significant rewrite of John's work and thus decided to forgo that. As a result, the apnx files produced when my code is run start at 1 and run sequentially just like John's do.

Attached are my version of apnx.py (zipped up) and a book in azw3 format with the pages already marked.

Edit: I've now attached a new book which I believe to be out of copyright. Published in 1926, the original author died in 1868 and the translator died in 1902. It has 93 pages of about 27 lines of ~47 characters. The book has also has 14 pages of front matter which are not included in the page count. I've only marked the pages in the main body, so you should get 93 pages using my code with page 1 occurring after the table of contents.

I'd appreciate it if others could test it out and provide advice on the implementation.

*I should note that the "<mbp:pagebreak/>" tag suffers from the same problem with the colon being replaced by u0003a that I described earlier.


Copyrighted material may not be posted on Mobileread. Removed.
Attached Files
File Type: zip apnx.py.zip (4.0 KB, 414 views)
File Type: azw3 Jesus Christ_ The Model of the Priest - Joseph Frassinetti.azw3 (83.2 KB, 402 views)

Last edited by rpspringuel; 05-13-2014 at 01:32 PM. Reason: New book upload.
rpspringuel is offline   Reply With Quote