Quote:
Originally Posted by tomsem
|
Since I am currently working on improving the handling of page images when converting Print Replica books I took a look at the book you mentioned.
Quote:
Originally Posted by tomsem
These evidently were Fixed Layout EPUBs that the publisher submitted to Kindle store, and they get converted to fixed layout KFX when downloaded to Kindles or the Kindle apps. These are sorta 'coffee table books- whose print editions would be larger than letter size pages, with 2 page spreads.
Unlike comic book format, they contain 'positioned' text in the XHTML files. On ePub platforms, you're usually able to search, and fonts are embedded so that the text remains crisp when zooming in.
On Kindle platform (at least books from this publisher) the embedded fonts are gone, hyperlinks and text search are not supported (even though Kindles will happily spend time indexing them). But they are defined in the downloaded file.
In the past, I would download AZW3 and convert to fixed layout ePub with KindleUnpack. (I can still do this using my Kindle Touch, even though it's impossible to actually read the book on it!). These do contain the positioned text and hyperlinks of the original ePub.
|
Examining the book in KF8 format indicates that it was published to Kindle as a comic and sourced with a fixed-layout EPUB. Each page of the book is composed of a 1607x1920 JPEG background image placed on a fixed-layout 1611x1924 page with an overlay of invisible text.
In that book the background images have everything the reader sees so the invisible text was possibly intended for annotation, dictionary lookup, and to provide links to other pages in the table of contents.
Quote:
Originally Posted by tomsem
But with KFX, From KFX conversion to ePub (or PDF) currently mishandles the text and hyperlinks.
In the ePub, the text is put inside a <div> tag's alt= property, the positioning is lost, and can no longer be target of text search. Hyperlinks are not instantiated.
|
The conversion by Amazon of that book to KFX format resulted in significant differences.
The background image for each page becomes a recompressed 1608x1920 JPEG image encapsulated within a single page PDF file.
The invisible text from each page is only provided as alt-text to the background image with no links or formatting retained.
Quote:
Originally Posted by tomsem
In PDF, I can see text objects with Acrobat object inspection but they are not placed over the word images they correspond to, and there are PDF links but they do not work (see attached screenshot).
|
The PDF pages corresponding to the book's table of contents contain unused, non-functional link annotations with destinations URIs such as: file:///opt/amazon/tmp/c8a7ae45-421b-456b-86d1-761a8cfa14cb-3e976d4c-db0f-47f5-a9d4-dc167f1fd903/f2d4b40f-762f-41f0-995d-a3f20de639da/extractedEpub/OEBPS/page_000010.xhtml
Trying to turn them into something useful would be hit-or-miss so I am not going to attempt it.
Quote:
Originally Posted by tomsem
So the information is in there, but is lost in translation. It would be nice if it was not, but I would rank it a low priority on your feature backlog, given this is kind of an edge case.
|
The next release of the KFX Input plugin will obtain slightly higher quality background images from this book and those like it when converting to EPUB or CBZ, but it will still be less than the image quality of the same book in KF8 format with an AZW6 HD image container.
That is the best I can do. The conversion to KFX by Amazon does not leave enough information to properly reconstruct the overlay text and links from the original EPUB that was provided by the publisher.