Content How to download full resolution images? - Page 2

jhowell · 01-02-2019, 03:42 PM

Quote:

Originally Posted by j.p.s

But if the KFX Input plugin makes an EPUB with high resolution images, why isn't it trivial to convert that to azw3?

It is trivial to do that and I think it makes sense to either do that or just live with the lower resolution images of azw3 as delivered by Amazon.

The potential problem with using KFX is that in conversion to/from KFX some of the details of the original HTML coding of the book are modified. A purist who wants the original HTML code along with the highest resolution images would need to take the best of both formats. Personally, I wouldn't bother.

rashkae · 01-04-2019, 12:21 AM

Or, you know, buy your light novels from Kobo? Just a thought.

HarryT · 01-04-2019, 03:59 AM

What is a "light novel"? That's an expression I haven't come across before!

rashkae · 01-04-2019, 07:13 AM

Down the rabbit hole we go:
https://en.wikipedia.org/wiki/Light_novel

HarryT · 01-04-2019, 07:16 AM

Thanks!

ATimson · 01-04-2019, 09:42 PM

Quote:

Originally Posted by rashkae

Or, you know, buy your light novels from Kobo? Just a thought.

That's also something I'm considering. (Well, Nook, but same basic idea.) However, my machine with the Nook software (and ADE) is not my everyday machine; plus I'd miss buying right from my Kindle... Which is why I wanted to get more info on what it'd take to keep using Amazon, rather than just giving up.

twynn92 · 12-23-2020, 02:57 PM

Quote:

Originally Posted by j.p.s

Substitute high res images into EPUB. You now have your archival book.

Resurrecting an essentially dead topic here, but if I used the command-line interface from the KFX plugin and Kindleunpack to make an epub for both KFX and KF8 files respectively, will the manifest in the opf list the images in the same order? The reason why I ask this is because I have no usable vision, but am interested in constructing an epub with full-resolution images whenever possible. If they are in the exact same order, I can use regular expressions and the terminal to mass-rename the images, paste them in place of the low-resolution images in the KF8, and repack using Calibre's ebook-polish CLI. And yes, I realize I can just simply look at the actual HTML source to determine, but that would be a painstaking process indeed, and am hoping someone with more knowledge of both KF8 and KFX formats would know the answer to my question.

Why not just stick with the KFX output? The main problem, for someone who relies on semantic information, is simply the fact that there is no semantic information when it comes to heading hierarchy in KFX format. Even when a KF8 delineates with perfectly compliant heading hierarchy, that information is not transferred over via KFX, though the visual formatting seems to still be the same as far as I can tell (did not make a one-to-one comparison). And it appears that the KFX format, even when using the clunky interface of the Kindle for PC app, does not pass the heading information to assistive technologies either, even if the details for the book says that screen reader is supported -- go figure.

Thanks for any time and effort spent on entertaining my query, and happy holidays to all.

jhowell · 12-23-2020, 04:54 PM

Quote:

Originally Posted by twynn92

Resurrecting an essentially dead topic here, but if I used the command-line interface from the KFX plugin and Kindleunpack to make an epub for both KFX and KF8 files respectively, will the manifest in the opf list the images in the same order?

No, they will not. The order of images in the OPF manifest when converted to EPUB provides no useful information. It will be alphabetical in both of these cases.

Images in KF8 are assigned numbers based on their order in the file. That often follows the order in which they are called out in the book, but not always. Images in KFX tend to have more arbitrary names assigned to them in different ways that have changed over time.

Parsing the HTML files that make up the book in spine sequence to generate a list of image names in order of reference would probably work best. Those would correspond between AZW3 and KFX formats in most cases, although there may be some circumstances where the lists might need to be tweaked a bit. Those lists could be used as the basis for substituting the proper files by renaming the ones from KFX to match the AZW3 names.

twynn92 · 12-23-2020, 06:01 PM

Quote:

Originally Posted by jhowell

Parsing the HTML files that make up the book in spine sequence to generate a list of image names in order of reference would probably work best. Those would correspond between AZW3 and KFX formats in most cases, although there may be some circumstances where the lists might need to be tweaked a bit. Those lists could be used as the basis for substituting the proper files by renaming the ones from KFX to match the AZW3 names.

Ah, that's what I thought. So not as trivial as I envisioned, though I could probably construct a regexp easily enough given a single HTML input (perhaps through Calibre's htmlz file format), then remove all duplicated entries? It's almost not worth the hassle, even if it's for archival and future-proofing purposes -- I'll have to give it some further thought. I never realized things were so disparately different until I looked at Amazon's details section which provided the estimated file size. Seeing that the KFX output roughly matches what Amazon reported, I then started web searching which lead me to this forum thread.

As I care more about the semantics for reading purposes, assume that these low-resolution images are human-viewable for the majority of screen sizes and image types? I mean it's roughly half the size for each image, e.g., 22.8 vs. 10.4 MB for 206 images, hopefully the pictures, maps, hand-written material, etc are not too blurry? It would be nice to have the best of both worlds, but if it's not easily feasible then maybe I'd call it a day and switch to a different reading platform in future which others have also considered here...

Thanks for taking the time to answer. It is highly appreciated.

jhowell · 12-23-2020, 06:37 PM

Quote:

Originally Posted by twynn92

As I care more about the semantics for reading purposes, assume that these low-resolution images are human-viewable for the majority of screen sizes and image types? I mean it's roughly half the size for each image, e.g., 22.8 vs. 10.4 MB for 206 images, hopefully the pictures, maps, hand-written material, etc are not too blurry?

Regular KF8 files have images that are of acceptable quality at Kindle screen resolutions, but they will often not show any more detail if zoomed in. It's too bad that there is no single Kindle format that has the best of everything.

As another example, I did a test using a Kindle book with lots of images, Better Homes and Gardens 13x9 The Pan That Can: 150 Fabulous Recipes. The product description for this book shows it as having 307 pages and taking 114MB.

Downloading it using "Download & transfer via USB" yields a 20MB AZW3 file. It contains 173 image files with the largest being 128KB.

Downloading the same book using Kindle for PC yields a combined 114MB KFX file. It also has 173 images, but the largest is 2MB (about 16x larger).

j.p.s · 12-23-2020, 07:27 PM

Quote:

Originally Posted by jhowell

Regular KF8 files have images that are of acceptable quality at Kindle screen resolutions, but they will often not show any more detail if zoomed in. It's too bad that there is no single Kindle format that has the best of everything.

Too bad, indeed.

Quote:

As another example, I did a test using a Kindle book with lots of images, Better Homes and Gardens 13x9 The Pan That Can: 150 Fabulous Recipes. The product description for this book shows it as having 307 pages and taking 114MB.

Downloading it using "Download & transfer via USB" yields a 20MB AZW3 file. It contains 173 image files with the largest being 128KB.

Downloading the same book using Kindle for PC yields a combined 114MB KFX file. It also has 173 images, but the largest is 2MB (about 16x larger).

Thanks for the details on how extreme it can be.

I think you have also written before that K4PC configured to download KF8 yields images in between.

jhowell · 12-23-2020, 07:56 PM

Quote:

Originally Posted by j.p.s

I think you have also written before that K4PC configured to download KF8 yields images in between.

It depends on the book. In the case of this particular book disabling KFX in Kindle for PC yields the same result as using “Download & transfer via USB”.

twynn92 · 12-25-2020, 09:31 PM

Just a quick update: ebook-convert's HTMLZ output format is not suitable at all because it renames the images, i.e., starting with 00000, and incrementing by 1 for each successive image. Pandoc is way better for this use case, but now I am running into the fact that the KF8 text refers to one more image than the KFX, so will have to take a look at the surrounding text to see why that is, and whether it will be trivial to work around or not, e.g., I can just discard the first or last image and have it matching in all other respects. Regular expressions and Notepad++ functions are really helping here, but it is definitely not easily automatable for sure.

Should I just leave this topic be, i.e., no further reports, as having the best of both worlds seems to be a specific use case that no one else really needs? In other words, this information would only be useful for someone like me who wants to have the highest resolution for images where possible but also keeping the semantic information. Everyone else is probably just satisfied with the KFX output, since most users are likely to run it through Calibre anyways, which bloats the code a bit and definitely does not leave it untouched no matter what arguments are used. Even if I were to be successful, it's not like it'll help anyone else out...

jhowell · 12-25-2020, 10:45 PM

Quote:

Originally Posted by twynn92

Should I just leave this topic be, i.e., no further reports, as having the best of both worlds seems to be a specific use case that no one else really needs?

I am interested in finding out whether or not this is possible and would be happy to learn more if you have anything to share.

twynn92 · 12-26-2020, 12:47 AM

Quote:

Originally Posted by twynn92

I am running into the fact that the KF8 text refers to one more image than the KFX, so will have to take a look at the surrounding text to see why that is

Whoops. That was an error on my part, as I did not account for the fact that not all images use the img tag. I've since rectified that and made the search far less restrictive by just using a different pattern, and everything aligns exactly:
KF8 (from Kindle Unpack-created EPUB): "Images/[^\.]+\.\w+"
KFX (from KFX Input-generated EPUB): "image_[^\.]+\.\w+"

Note: The first image in the KFX (referenced in part0000.xhtml) is supposed to be an SVG, while the corresponding file in the KF8 is an image, though the cover page in the KF8 (cover_page.xhtml) is also an SVG. In other words, the KF8 has two separate images -- the less compressed coverxxxxx, and the more compressed imagexxxxx.

As the book cover image in the KFX is larger in file size than either of the two, I could just duplicate and replace the two book covers with it; but since cover_page.xhtml in the KF8 seems to better match the first image in part0000.xhtml in the KFX, I'll probably rename that as the cover image in the KF8 -- leaving the first image in the KF8 alone. I'll have to see how things go when doing another book to see how things match.

Because the KFX didn't have an alt text attribute for any of the images (img tag), essentially making them invisible to screen readers, I had to construct a find/replace regexp to add in some fake alt text by duplicating the file name there. It was pretty, but hacks rarely ever are. I could then make a textual comparison between the two versions to make sure that the images matched where they were supposed two. I just did the first and last five images to make sure I didn't have any offsets. Even though the total number of images in both files matched exactly, and removing the duplicates also still matched, I still wanted to make sure, especially since cover_page.xhtml didn't exist when converting the EPUB to HTML using Pandoc.

As this is definitely doable, and the gruntwork of renaming can be automated, I can expend the effort for books that I know to have a lot of images, but only for those, as it's still very much an annoying process. Essentially, the steps are:
1. Download both KF8 and KFX versions, decrypt with DeDRM, and convert to EPUB using two differen CLIs.
2. Convert the EPUBs to HTML using Pandoc to get the image file names in viewing order.
3. Use Notepad++ to Strip the source of everything but the image file names (one per line) using a combination of regexp and search functionality.
4. Make sure the total number of references matches in both files, and use another regexp to strip the duplicated entries, again making sure the total number of entries match. Also make sure the file extensions match too, but hopefully that will never be an issue since Amazon converts already anyways.
5. Paste the contents of both lists into one file, using Notepad++ and a regexp to combine the file names line-by-line to create a rename command in terminal.
6. Rename using a terminal, replace the low-resolution images with the high-resolution images, and finally, repackage the final EPUB with the ebook-polish CLI.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How to make pages with full bleed / full page images	mypointofview	Editor	5	06-21-2016 05:45 PM
dimensions and resolution of background images in epubs	Derek R	ePub	2	02-16-2012 04:44 PM
Is there any way to get Calibre to send the high resolution cover images to device?	Arainais	Devices	5	08-27-2011 07:38 AM
Full page images	graywolf336	ePub	3	11-17-2010 02:05 PM
PRS-500 optimal resolution for images?	ghostwheel	Sony Reader Dev Corner	5	01-01-2007 12:59 PM

01-04-2019, 12:21 AM	#17
rashkae Wizard Posts: 1,292 Karma: 5935030 Join Date: Jun 2011 Location: Ontario, Canada Device: Kobo Aura HD	Or, you know, buy your light novels from Kobo? Just a thought.

01-04-2019, 03:59 AM	#18
HarryT eBook Enthusiast Posts: 85,560 Karma: 93980705 Join Date: Nov 2006 Location: UK Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6	What is a "light novel"? That's an expression I haven't come across before!

01-04-2019, 07:13 AM	#19
rashkae Wizard Posts: 1,292 Karma: 5935030 Join Date: Jun 2011 Location: Ontario, Canada Device: Kobo Aura HD	Down the rabbit hole we go: https://en.wikipedia.org/wiki/Light_novel

01-04-2019, 07:16 AM	#20
HarryT eBook Enthusiast Posts: 85,560 Karma: 93980705 Join Date: Nov 2006 Location: UK Device: Kindle Oasis 2, iPad Pro 10.5", iPhone 6	Thanks!

12-25-2020, 09:31 PM	#28
twynn92 Junior Member Posts: 8 Karma: 1000 Join Date: Dec 2020 Device: none	Just a quick update: ebook-convert's HTMLZ output format is not suitable at all because it renames the images, i.e., starting with 00000, and incrementing by 1 for each successive image. Pandoc is way better for this use case, but now I am running into the fact that the KF8 text refers to one more image than the KFX, so will have to take a look at the surrounding text to see why that is, and whether it will be trivial to work around or not, e.g., I can just discard the first or last image and have it matching in all other respects. Regular expressions and Notepad++ functions are really helping here, but it is definitely not easily automatable for sure. Should I just leave this topic be, i.e., no further reports, as having the best of both worlds seems to be a specific use case that no one else really needs? In other words, this information would only be useful for someone like me who wants to have the highest resolution for images where possible but also keeping the semantic information. Everyone else is probably just satisfied with the KFX output, since most users are likely to run it through Calibre anyways, which bloats the code a bit and definitely does not leave it untouched no matter what arguments are used. Even if I were to be successful, it's not like it'll help anyone else out...