MobileRead Forums - View Single Post

j.p.s · 12-07-2019, 02:43 PM

Quote:

Originally Posted by KevinH

Yes, the "bad" ones definitely look bad and the "fix" ones look much better. I am surprised as to why this happens as in older mobi 6 and mokbi 7 internally links are filepos info (file offsets) and in newer mobi8 they encode a base 32 file offset into a character based "id-like" equivalent. Both file offsets should be quite precise and not lead to what you are seeing.

Is it just moving in the wrong direction to get the exact link text? Are the "bad" and "fix" targets in any way close together?

That is very strange.

KevinH

I've played with this some more as I get bits of time and better understanding of apnx.

I paginated an EPUB by hand by inserting anchors based on a PDF scan of the book and generated a pagelist from a list of the anchors. The apnx files generated by running kindlegen on the EPUB and kindleupack on the kindlegen output points to the opening "<" of the anchor for both the mobi7 and mobi8 raw markup. (The mobi7 markup has empty <a ></a>.)

I had not previously looked into the pagination for books that I did not notice any problem when reading. I wrote a script to dump the page table of offsets at the end of an apnx file and optionally 16 characters from the raw markup (assembled_text.dat) beginning at each offset. No commercial book perfectly matched at every page, but a few came close with a couple actually matching on almost every page. Some had small offsets, others larger. Sometimes the offset was not a fixed amount. A few did not have anchors or spans that indicated page boundaries, so I have no idea how accurate the apnx offsets are.

I'm attaching apnx_dump.pl