MobileRead Forums - View Single Post

jhowell · 01-29-2020, 01:25 PM

Quote:

Originally Posted by j.p.s

I'm starting to think publishers are using custom apnx generators or modify apnx files for some reason. Sometimes there are no page targets present at all. When there are targets, I haven't been able to figure out whether the errors are systematic.

I doubt the publishers have anything to do with the APNX other than providing the NCX pageList or NAV page-list used by kindlegen.

As far as I can tell kindlegen builds PAGE records for MOBI7 and KF8 that accurately map the offset of the starting '<' of the element containing the ID in the raw HTML content for each page in the EPUB pageList/page-list. It appears that Amazon is manipulating that data in the publishing workflow when processing PAGE records to produce APNX files.

Some Kindle reading apps/devices use the APNX position array, when an APNX is present, to calculate the percent complete shown to the user. The APNX position array often has a number of unlabeled zero entries added to the beginning. I suspect that Amazon adds these entries to account for the amount of front matter before the first assigned page number in order to have the percent shown come out more accurately.

One or more extra entries are often added to the end of the APNX position array, mapped to empty labels. This may also have to do with percentage, but more importantly it prevents the final page number from being shown for the entire remainder of books that contain unnumbered back matter after the last numbered page.

Why the positions are sometimes off by varying small amounts (tens of bytes) is harder to explain. Values appear to be adjusted to correspond to an explicit page break (such as <body> in KF8 or <mbp:pagebreak/> in MOBI7) or to a character of text that will be visible to the reader. My best guess is that Amazon is making adjustments that produce offsets that work better with the mapping that is done between equivalent positions of visible text in the MOBI7 and KF8 formats. This mapping is needed to make notes and highlights match exactly between formats. (Also, associating a printed page with a set of visible characters, rather than particular HTML markup, makes some logical sense.)

If you have other ideas about what is going on during APNX generation I would be interested to hear them.