View Single Post
Old 08-08-2025, 12:13 AM   #1056
tomsem
Grand Sorcerer
tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.tomsem ought to be getting tired of karma fortunes by now.
 
Posts: 6,959
Karma: 27060153
Join Date: Apr 2009
Location: USA
Device: iPhone 15PM, Kindle Scribe, iPad mini 6, PocketBook InkPad Color 3
Quote:
Originally Posted by jhowell View Post

Examining the book in KF8 format indicates that it was published to Kindle as a comic and sourced with a fixed-layout EPUB. Each page of the book is composed of a 1607x1920 JPEG background image placed on a fixed-layout 1611x1924 page with an overlay of invisible text.

In that book the background images have everything the reader sees so the invisible text was possibly intended for annotation, dictionary lookup, and to provide links to other pages in the table of contents.



The conversion by Amazon of that book to KFX format resulted in significant differences.

The background image for each page becomes a recompressed 1608x1920 JPEG image encapsulated within a single page PDF file.

The invisible text from each page is only provided as alt-text to the background image with no links or formatting retained.



The PDF pages corresponding to the book's table of contents contain unused, non-functional link annotations with destinations URIs such as: file:///opt/amazon/tmp/c8a7ae45-421b-456b-86d1-761a8cfa14cb-3e976d4c-db0f-47f5-a9d4-dc167f1fd903/f2d4b40f-762f-41f0-995d-a3f20de639da/extractedEpub/OEBPS/page_000010.xhtml

Trying to turn them into something useful would be hit-or-miss so I am not going to attempt it.



The next release of the KFX Input plugin will obtain slightly higher quality background images from this book and those like it when converting to EPUB or CBZ, but it will still be less than the image quality of the same book in KF8 format with an AZW6 HD image container.

That is the best I can do. The conversion to KFX by Amazon does not leave enough information to properly reconstruct the overlay text and links from the original EPUB that was provided by the publisher.
Thanks!

The conversion from KFX to PDF is still useful enough: I can OCR the PDF to add text objects and fix the links manually. For this type of book, it seems to me, PDF is more functional than KFX or fixed layout ePub, to extent one wants to annotate or search.

There are a couple of GitHub projects that I'm planning to investigate more (both as tools to have around, and maybe creating a calibre plugin...):

- https://github.com/mashu3/epub2pdf/

- https://github.com/aourednik/pdf2epub3fixed

There is a current Humble Bundle offer for some of the DK books, they would in ePub format (from Kobo.com):

https://www.humblebundle.com/books/h...sdk_bookbundle

Last edited by tomsem; 08-09-2025 at 12:47 PM.
tomsem is offline   Reply With Quote