MobileRead Forums - View Single Post

j.p.s · 02-01-2020, 01:41 PM

Quote:

Originally Posted by jhowell

I doubt the publishers have anything to do with the APNX other than providing the NCX pageList or NAV page-list used by kindlegen.

Fair enough. I know nothing about publisher or amazon workflow, where the boundaries are, or whether the boundaries can shift.

Quote:

As far as I can tell kindlegen builds PAGE records for MOBI7 and KF8 that accurately map the offset of the starting '<' of the element containing the ID in the raw HTML content for each page in the EPUB pageList/page-list. It appears that Amazon is manipulating that data in the publishing workflow when processing PAGE records to produce APNX files.

I am also completely ignorant of the details of kindle format internals or even palmdb internals. I guess I should start picking that up. I've been assuming that kindlegen embeds apnx files into the mobi it produces, but I guess that is just as false as thinking that it embeds an EPUB (neglecting "append source").

Based on the APNX produced by kindlegen, I agree that those maps point to the starting '<' and that the APNX delivered by amazon are often offset by a few (or many) bytes. Sometimes an incorrect element seems to have been targeted.

When I generated an EPUB from asciidoc marked up text which included page markers that I inserted myself, I noticed that page number entries in kindle GOTO dialogs showing the TOC were 1 too low and that the same would happen for the first page number in a chapter when reading the book. I had assumed that had to do with the asciidoc to epub conversion process, but I saw that it sometimes happened with amazon supplied APNX files. So I understand the motivation to manipulate the mapping. Since it was inconsistent, I assumed it was the publishers doing the manipulating.

Quote:

Some Kindle reading apps/devices use the APNX position array, when an APNX is present, to calculate the percent complete shown to the user. The APNX position array often has a number of unlabeled zero entries added to the beginning. I suspect that Amazon adds these entries to account for the amount of front matter before the first assigned page number in order to have the percent shown come out more accurately.

When I first started paying attention to page numbers on kindles, I wondered whether page numbers would be used for % in book, but as near as I can tell on my kindles, "location" is still used. I've posted elsewhere about books with extensive heavily formatted end matter where 50% to 70% is shown at the end of the last chapter but where the page number is a much higher percentage of the total pages.

I agree that one reason for zeroed entries at the beginning of the map is because page 1 is well into the book. Frequently the pbook has roman numerals for those pages and the ebook shows no label for those pages. But some amazon delivered APNX files do have roman numeral map entries. That is part of why I suspected it varied with publisher as opposed to amazon shenanigans, but it would not be surprising if amazon is inconsistent in its process. (Often the roman numeral page id elements are in the raw book HTML, but not in the amazon delivered APNX.)

Quote:

One or more extra entries are often added to the end of the APNX position array, mapped to empty labels. This may also have to do with percentage, but more importantly it prevents the final page number from being shown for the entire remainder of books that contain unnumbered back matter after the last numbered page.

One of the first problems I noticed with amazon page numbers was a "final" page number that showed for the rest of the book. It stuck out like a sore thumb in the "GOTO" TOC where there were quite a few TOC entries with that "final" page number. I assumed it was a glitch in the (presumably epub) book source, but it turned out that putting a proper pagelist or page-map in the kindleunpack generated epub, feeding that to kindlegen, extracting the apnx, and using that with the original amazon supplied azw3 fixed page number display for that book.

Quote:

Why the positions are sometimes off by varying small amounts (tens of bytes) is harder to explain. Values appear to be adjusted to correspond to an explicit page break (such as <body> in KF8 or <mbp:pagebreak/> in MOBI7) or to a character of text that will be visible to the reader. My best guess is that Amazon is making adjustments that produce offsets that work better with the mapping that is done between equivalent positions of visible text in the MOBI7 and KF8 formats. This mapping is needed to make notes and highlights match exactly between formats. (Also, associating a printed page with a set of visible characters, rather than particular HTML markup, makes some logical sense.)

Could be, and I am fine with that if true.

But in my very tiny sampling the incidence of glitches and outright SNAFU is very high. That's annoying, but the good news is that is possible for individuals to make the fix themselves.

Quote:

If you have other ideas about what is going on during APNX generation I would be interested to hear them.

I seem to be better at detecting the problems and coming up with strategies to fix than determining why they happened in the first place, but I certainly welcome the discussion and will try to contribute when I can.