Thought some might find this interesting:
Already noted is the presence of an .apnx file, which seems to be the thing that adds page numbers to a given book. In a hex dump, you can see a string table/dictionary at the top (this one is for 'The Girl Who Kicked the Hornet's Nest'):
{"contentGuid" : "78a941d9", "asin" : "B0031YJFCQ", "codeType" : "EBOK", "fileRevisionId" : "1"} - {"pageMap" : "(1, a, 1), "asin" : "030726999X"}
So we see both the Amazon ASIN and print edition ISBN here.
This is followed by an array of 16 byte values which appear to represent a sequence of numbers arranged in ascending order. I'm guessing that each of these defines an offset to the position that corresponds to the start of a given physical page number. The number of 16 byte values seems to be very close to the number of page numbers in the book (there are a few additional rows of bytes that precede the presumed 'page map' as such, and may have some special significance).
In the book I'm looking at, pages before page "1" do not have page numbers (such as i, ii, iii, iv etc.). (Wonder if that's a limitation of Amazon's page mapping scheme, or just what they did for this particular book?) I'd also note that the last page number (in this case '563') was applied to content that almost certainly spreads over more than one physical page. The ebook edition puts the copyright page at the end, as well as a cover image, these should not be labeled as being page '563'. Okay, so it is not perfect, at least in this case.
Presumably this scheme also works with Topaz format books, a requirement Amazon would need to take on, and it's something they can do after material is submitted to them for publishing.
It's not clear how self-published books can get page numbers, since 'locations' don't exist until you bake the .azw file. Maybe it leverages NCX page list? (the latest version of KindleGen appears to store a copy of the source files..).
There are also two other file extensions that may come with any Kindle Store book now (not just those with page numbers):
.ea - this is an xml file that contains the data for the 'Customers who bought this book also bought' and 'More by this author' lists that now show up after the last page of the book, including ASIN so it can jump to the title's Kindle Store page.
.phl - is an XML file that identifies a position offset of popular highlights in the book. That's probably been there for awhile, since the popular highlights feature was introduced for K2/DX.
I assume the .apnx comes along when you download the book, the other two might be updated with each sync.
|