Problems with downsampled Hi-def images on a standard-defintion screens

MaudlinHaus · 11-12-2014, 12:16 PM

Hi All-

First time posting, long time reading posts here—tons of knowledge and insight

I work for a small publisher doing desktop publishing, creating graphics, and overseeing ebook conversion. We don't create our epubs in house (we send PDFs to conversion house), but since I have a background in HTML/CSS, I have the opportunity to work closely with our vendor to get the kind of output we want, standardize design, etc.

One issue we initially had with the conversion house was image resolution. We proof our books on an iPad, which has a higher density display than a standard screen, and the display of images in the books was really poor. I did some research and came across a good article detailing a few methods for creating graphics for a high-density display (http://www.smashingmagazine.com/2012...ds-retina-web/). It's not ebook-specific advice, but applying a version of the HTML/CSS principle they lay out has given us really great looking images on the iPad.

However, the hi-def images look crap in Adobe Digital Editions for desktop PC (1280x1024 screen--I know, not the most modern setup). The article warned that there'd be some downsampling and possible loss of quality on standard def screens, but the difference is really stark. Some images that have text in them are almost unreadable in ADE. What to do? Have any other posters wrestled with this issue? I want to future proof the books but I don't want to alienate desktop readers or make them think we have a crap product. Thanks in advance for any help you can offer.

Toxaris · 11-12-2014, 01:17 PM

Well, the first thing you can do is stop sending PDF to your conversion house. That is the most crappy format to start from...

With regards to your image problem, that is just the current situation. Downsampling is really depending on the algorithms and for a lot of readers they are not great. Also, high resolution is fine, but there are millions of readers that do not support those resolutions. So, there is a problem there. Some publishers make two versions. One with a high resolution and one with a lower one. If you have full screen images, you could use a SVG wrapper. Downscaling of those are usually acceptable.

theducks · 11-12-2014, 01:37 PM

Quote:

Originally Posted by Toxaris

Well, the first thing you can do is stop sending PDF to your conversion house. That is the most crappy format to start from...

PDF is a Printed PAGE format

DaleDe · 11-12-2014, 02:32 PM

Quote:

Originally Posted by Toxaris

Well, the first thing you can do is stop sending PDF to your conversion house. That is the most crappy format to start from...

With regards to your image problem, that is just the current situation. Downsampling is really depending on the algorithms and for a lot of readers they are not great. Also, high resolution is fine, but there are millions of readers that do not support those resolutions. So, there is a problem there. Some publishers make two versions. One with a high resolution and one with a lower one. If you have full screen images, you could use a SVG wrapper. Downscaling of those are usually acceptable.

SVG works fine for images that are not full screen. It does not work for images where the text is wrapped around the image, at least I have not seen an example of that.

Dale

MaudlinHaus · 11-12-2014, 03:39 PM

So SVG wrapper affects how a file is downsampled? And the downsampling is smoother than if the size is just called out using html/CSS? What about support across readers/devices--are there known (mainstream) readers that definitely don't support SVG?

RE: PDF, I would guess that most presses send PDF for conversion. What format would you send instead?

Tex2002ans · 11-12-2014, 09:04 PM

Quote:

Originally Posted by MaudlinHaus

[...] We don't create our epubs in house (we send PDFs to conversion house), but since I have a background in HTML/CSS, I have the opportunity to work closely with our vendor to get the kind of output we want, standardize design, etc.

[...]

RE: PDF, I would guess that most presses send PDF for conversion. What format would you send instead?

I then recommend exporting directly to HTML or EPUB from whatever program you are creating. I am assuming you are using InDesign/Quark?

You will then have to either do the HTML/EPUB cleaning in-house, or find a conversion place (or independent contractor) that DOES handle InDesign/EPUB/HTML files directly.

(Note: These conversion houses are most likely going to be more expensive than the dirt-cheap "Indian"/"Chinese" conversion companies, but you will pay slightly more for much higher quality).

The closer you can work to the original source material, the better!

Look, you ALREADY have the exact digital text... leading it through PDF is just going to create a whole host of extra problems. PDF is meant as a final OUTPUT format, it is just about THE WORST format to ever work backwards from.

Workflow A (Correct):

InDesign/Quark/Word
- Output to EPUB/HTML
Clean the EPUB
- Since all of the text matches EXACTLY, all your time is just spent on cleaning up ugly code.
- Then you just have to spend some time making sure everything was exported correctly
  - Making sure captions went below all the images
  - Footnotes are working and are in the right place, etc. etc.
Final EPUB
- Test on your device, and fix up any minor mistakes you catch.

Workflow B (Horrible, Inefficient, Waste):

Indesign/Quark
- Output to PDF
OCR the PDF
- This is where you run it through a program which takes the image, and tries to "guess" what the character is.
- This leads to A TON of extra steps... and depending on the complexity of the book, HOURS AND HOURS.
- For example, here is one of my posts explaining the entire PDF -> EPUB process: https://www.mobileread.com/forums/sho...72&postcount=6
- Also, the text might not be 100% correct, typos might will be introduced.
Clean the EPUB
- Many many hours of work, you have to make sure all the paragraphs are attached, plus all of the same work as Workflow A (captions, footnotes, etc. etc.).
- Now, since the text might not be 100%, you also have to spend a lot of time spellchecking, and looking for typos, fixing hyphenation, finding weird symbols that might have been introduced due to the OCR.
  - Missing accents on letters
  - Extra apostrophes.
  - Squiggly brackets instead of normal brackets/parenthesis.
  - [...]
Final EPUB
- Test on your device, and fix up any minor mistakes you catch.

It is maybe turning a "few hour" job of just cleaning code, into a "many hour" job.

Quote:

Originally Posted by MaudlinHaus

One issue we initially had with the conversion house was image resolution. We proof our books on an iPad, which has a higher density display than a standard screen, and the display of images in the books was really poor.

You have the source files... you have the original image files. Just plop those right into the EPUB, and fix up whatever needs to be tweaked (filenames, file size, etc. etc.).

For example, you don't want your original 3-5MB+ cover file in your EPUB. You might want to save that as a lower quality JPG.

What typically happens is the method used to pull the image from the PDF -> EPUB probably degraded it. Again, this is one of the flaws with working as PDF as the INPUT format.

You have the advantage, because you guys already have all the source files.

So what you would typically do, is hand over your original InDesign files, PLUS, hand over a ZIP file of all the original images. (This is how I handle InDesign -> EPUB conversions).

Quote:

Originally Posted by MaudlinHaus

However, the hi-def images look crap in Adobe Digital Editions for desktop PC (1280x1024 screen--I know, not the most modern setup). The article warned that there'd be some downsampling and possible loss of quality on standard def screens, but the difference is really stark.

Hmmm, would it be possible to show any examples? I personally haven't seen things get TOO bad from high resolution -> lower resolution images. Only spot I can think of is if text is in the images.

And as Toxaris said, it is really up to the resizing algorithms of the device.

Quote:

Originally Posted by MaudlinHaus

Some images that have text in them are almost unreadable in ADE. What to do? Have any other posters wrestled with this issue?

Text....... in images... you say? That is one of my biggest pet peeves! What sort of data is being displayed here... is this Tables saved as images? Is this charts/graphs?

My personal philosophy is to avoid text in images as much as possible, and try to pull as much of that into the HTML equivalent as possible. Where you HAVE to use it, save as PNG (AVOID JPG IN THAT CASE).

If it is a vector chart/graph, already in InDesign/Illustrator, then it would probably be best to go back to the source material, and generate a "lower resolution" PNG directly from the vector files!

Side Note: I wrote about advantages/disadvantages of HTML/Image Tables here: https://www.mobileread.com/forums/sho...d.php?t=223062

Quote:

Originally Posted by MaudlinHaus

I want to future proof the books but I don't want to alienate desktop readers or make them think we have a crap product. Thanks in advance for any help you can offer.

So say we all... you should be releasing high quality stuff... not crap! Which is why you should avoid those dirt-cheap conversion companies... that way will only bring headaches (and you will have a lot more overhead/problems in the long-run). The correct way is to get it done RIGHT the first time. Not, get it done cheaply/crappily, and then pay someone to come around AGAIN, to have to double-check all the work, and clean up the file and get it done right.

The entire reason I got into this in the first place was being of EPUBs that were HORRIBLY converted. Tons of typos, tons of mistakes, horrible code, low-resolution images, etc. etc.

Anyway, I work on non-fiction economics books mostly, and I do a lot of work doing PDF -> EPUB conversions (mostly from scanned books).

When I work on newer books though, where we have the original InDesign files, that is DEFINITELY the way to go. Avoid PDF completely if you can.

Side Note: This isn't getting into the discussion of perhaps having to change the entire "print book" workflow. Typically, the companies do a "print book FIRST" mentality, and then ebook is just a dirty side-thought. What has to start happening, is shifting to an "HTML/ebook FIRST" mentality. And start designing the books in InDesign in ways which will make it easier to generate both (consistent usage of styles/classes, etc. etc.).

MaudlinHaus · 11-14-2014, 05:41 PM

You make some good points, but you make a bunch of incorrect statements about PDF. If you generate a PDF from InDesign, you get fully searchable, highlightable, copyable text--OCR is not in the picture at all. And if you export from inDesign with no downsampling, a 3-inch, 300dpi image in indesign is a 3-inch, 300dpi image in PDF--there's no loss there. And as I said, from the PDF source file, using CSS/html to squish 2x pixels into a given screen pixel space (a 400 px wide image gets screen pixel width="200" for example), we get high quality results on the ipad screen (I'm a little afraid of what is going on on the various HD Kindles, but I like the iPad better as a high-density screen standard.)

The issue I'm seeing, as far as I know I can tell is that by allowing the reader software (which is essentially a browser) to resize with HTML/CSS, you can end up tasking the software with rerendering a bunch of pixels into a really small space, so whereas I'm squishing a 400 px image into 200 screen pixels in iBooks and that looks good because the screen resolution is pretty high, it ends up looking pretty bad in ADE on an older monitor due to extreme downsampling and lower pixel per inch.

The more I think about this, the more I'm realizing how old the monitors we use at work are. I think monitors even slightly older have better output and are hopefully not showing these problems to the reader. Also, I checked the same epub in ADE and Readium, and the downsampling in Readium is much smoother. It's a shame that the niceties that people have come to expect from the browsers have not hit all of the ereader platforms. Yet another reason we need to just be able to open the books in the damn browser and call it a day :/

Tex2002ans · 11-14-2014, 07:45 PM

Quote:

Originally Posted by MaudlinHaus

You make some good points, but you make a bunch of incorrect statements about PDF. If you generate a PDF from InDesign, you get fully searchable, highlightable, copyable text--OCR is not in the picture at all.

You would think... you would think. PDF is quite a complex file type, and the way that many programs piece things together on the backend causes a heck of a lot of headaches. A lot of this is also VERY dependent on the settings that were actually used to create the PDF.

But get down into the nitty gritty, and things get UGLY. For example, ligatures might disappear in the text backend, characters with symbols 'ñ' might just show up as 'n'. (In the printed PDF though, you can see the little tilde + ligatures, but in the actual text backend, nope).

Then a lot of metadata can be tossed out the window, things such as footnotes/sidebars/headers/footers/captions, might not be marked as such. The PDF knows the LOCATION of this text, and it knows exactly where to plop them when you are printing/displaying it in a PDF Reader, but it doesn't know WHAT they are (this is extremely important when making an ebook).

You can pull the PLAIN TEXT out very easily (although, no formatting). But formatting is EXTREMELY important to the look of the book.

Then if you look at the actual code, oh boy. Using something like xpdf or poppler might get you this:

Spoiler:

Quote:

[{"top":599,"left":60,"width":32,"height":15,"font" :2,"data":"Boer"},{"top":599,"left":92,"width":4," height":15,"font":2,"data":" "},{"top":599,"left":95,"width":28,"height":15,"fo nt":2,"data":"War"},{"top":599,"left":124,"width": 4,"height":15,"font":2,"data":" "},{"top":599,"left":127,"width":54,"height":15,"f ont":2,"data":"Veteran"},{"top":599,"left":181,"wi dth":4,"height":15,"font":2,"data":" "},{"top":599,"left":185,"width":41,"height":15,"f ont":2,"data":"Status"},{"top":620,"left":60,"widt h":53,"height":15,"font":2,"data":"Thomas"},{"top" :620,"left":113,"width":4,"height":15,"font":2,"da ta":" "},{"top":620,"left":117,"width":59,"height":15,"f ont":2,"data":"returned"},{"top":620,"left":176,"w idth":4,"height":15,"font":2,"data":" "},{"top":620,"left":180,"width":14,"height":15,"f ont":2,"data":"to"},{"top":620,"left":194,"width": 4,"height":15,"font":2,"data":" "},{"top":620,"left":197,"width":59,"height":15,"f ont":2,"data":"Adelaide"},{"top":620,"left":257,"w idth":4,"height":15,"font":2,"data":" "},{"top":620,"left":260,"width":14,"height":15,"f ont":2,"data":"as"},{"top":620,"left":275,"width": 4,"height":15,"font":2,"data":" "},{"top":620,"left":278,"width":8,"height":15,"fo nt":2,"data":"a"},{"top":620,"left":286,"width":4, "height":15,"font":2,"data":" "},{"top":620,"left":290,"width":63,"height":15,"f ont":2,"data":"wounded"},{"top":620,"left":353,"wi dth":4,"height":15,"font":2,"data":" "},{"top":620,"left":356,"width":68,"height":15,"f ont":2,"data":"decorated"},{"top":620,"left":425," width":4,"height":15,"font":2,"data":" "},{"top":620,"left":428,"width":32,"height":15,"f ont":2,"data":"Boer"},{"top":620,"left":460,"width ":4,"height":15,"font":2,"data":" "},{"top":620,"left":464,"width":28,"height":15,"f ont":2,"data":"War"},{"top":620,"left":492,"width" :4,"height":15,"font":2,"data":" "},{"top":620,"left":496,"width":54,"height":15,"f ont":2,"data":"Veteran"},{"top":620,"left":549,"wi dth":4,"height":15,"font":2,"data":"

That is just to display the words "Boer War Veteran Status Thomas returned Adelaide wounded decorated War Veteran". The way that PDF works is that it places words in EXACT positions. This is why you have things like "heuristics" in Calibre, to try to GUESS what goes where, and what goes in what logical order, what font that was supposed to be, was it supposed to be bold/italics, what is a paragraph. (Again, heuristics are going to get a lot of things wrong, lots of errors introduced).

Any way you slice it, to pull out formatted text from a PDF, big waste of time (which is why in most cases, it is easier/faster to just re-OCR the entire thing).

Perhaps you have more knowledge of tools though. If so, teach me, I would LOVE to be able to pull out data from PDFs much more efficiently! It would be AMAZING. And then get people to start doing the PDF workflow that actually allows this to be possible!

And the original InDesign/Quark files ALREADY have all that nice formatting information just sitting in there, so if you export directly from there, that will be MUCH cleaner than trying to work backwards from the PDF.

Similar thing with images from PDF, now I am saying, that not all PDF -> XYZ format WILL just pull out the image losslessly. Again, it all depends on how the conversion place you send it to does it. Perhaps they do it right, but in my experience, I have not seen that. Perhaps you have had better luck and sent it to a place that does it properly.

Which is why I settled on the method, you just send me the original images separately, and I can work from that. No need to go through some hideous PDF middleman.

Quote:

Originally Posted by MaudlinHaus

And if you export from inDesign with no downsampling, a 3-inch, 300dpi image in indesign is a 3-inch, 300dpi image in PDF--there's no loss there. And as I said, from the PDF source file, using CSS/html to squish 2x pixels into a given screen pixel space (a 400 px wide image gets screen pixel width="200" for example), we get high quality results on the ipad screen (I'm a little afraid of what is going on on the various HD Kindles, but I like the iPad better as a high-density screen standard.)

Hmmm, again, any examples?

Now, if you don't like the specific downsampling on the devices, then the only possible solution is to downsample them using an outside program, and inserting the lower resolution image in the file.

For example, this downsampling talk reminded me a lot of what GrannyGrump does with high-quality line-drawings scanned from older books. For example:

https://www.mobileread.com/forums/sho...15#post2682815

These specific types of drawings downscale HORRIBLY due to the downscaling algorithm on most devices. So the best bet would be to manually downscale using other tools, which might have more efficient/better algorithms for dealing with lines (Photoshop, GIMP, etc. etc.). So maybe you just pick a decent size resolution, like 1024x1024.

Quote:

Originally Posted by MaudlinHaus

The issue I'm seeing, as far as I know I can tell is that by allowing the reader software (which is essentially a browser) to resize with HTML/CSS, you can end up tasking the software with rerendering a bunch of pixels into a really small space, so whereas I'm squishing a 400 px image into 200 screen pixels in iBooks and that looks good because the screen resolution is pretty high, it ends up looking pretty bad in ADE on an older monitor due to extreme downsampling and lower pixel per inch.

What other devices have you tested on, any eink devices? If you have a problem with those screens, you are probably going to have a heart attack if you saw it on an older EPUB reader.

Again, any sort of examples would be helpful. I personally haven't seen any sorts of images that are TOO bad (besides images with text).

Toxaris · 11-15-2014, 02:47 AM

Quote:

Originally Posted by MaudlinHaus

You make some good points, but you make a bunch of incorrect statements about PDF. If you generate a PDF from InDesign, you get fully searchable, highlightable, copyable text--OCR is not in the picture at all. And if you export from inDesign with no downsampling, a 3-inch, 300dpi image in indesign is a 3-inch, 300dpi image in PDF--there's no loss there. And as I said, from the PDF source file, using CSS/html to squish 2x pixels into a given screen pixel space (a 400 px wide image gets screen pixel width="200" for example), we get high quality results on the ipad screen (I'm a little afraid of what is going on on the various HD Kindles, but I like the iPad better as a high-density screen standard.)

No, my statements regarding PDF are not incorrect. Sure, you can extract the image from a PDF, but the quality depends on the used tools and how the PDF was constructed. It can go sour quite fast.

The example from Tex2002ans is even a nice example. Since PDF is defining where on the screen it must be, it can even put the text out of order in some cases. So, copying the text will then also be out of order. The only way you would know is by actually comparing the texts next to each other. No conversion house will do that due to the costs.

11-12-2014, 12:16 PM	#1
MaudlinHaus Junior Member Posts: 3 Karma: 10 Join Date: Mar 2013 Device: iPad	Problems with downsampled Hi-def images on a standard-defintion screens Hi All- First time posting, long time reading posts here—tons of knowledge and insight I work for a small publisher doing desktop publishing, creating graphics, and overseeing ebook conversion. We don't create our epubs in house (we send PDFs to conversion house), but since I have a background in HTML/CSS, I have the opportunity to work closely with our vendor to get the kind of output we want, standardize design, etc. One issue we initially had with the conversion house was image resolution. We proof our books on an iPad, which has a higher density display than a standard screen, and the display of images in the books was really poor. I did some research and came across a good article detailing a few methods for creating graphics for a high-density display (http://www.smashingmagazine.com/2012...ds-retina-web/). It's not ebook-specific advice, but applying a version of the HTML/CSS principle they lay out has given us really great looking images on the iPad. However, the hi-def images look crap in Adobe Digital Editions for desktop PC (1280x1024 screen--I know, not the most modern setup). The article warned that there'd be some downsampling and possible loss of quality on standard def screens, but the difference is really stark. Some images that have text in them are almost unreadable in ADE. What to do? Have any other posters wrestled with this issue? I want to future proof the books but I don't want to alienate desktop readers or make them think we have a crap product. Thanks in advance for any help you can offer.

Thread Tools	Search this Thread
Show Printable Version Email this Page	Search this Thread: Advanced Search

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Again Problems with images	epublisher	ePub	2	11-01-2012 06:14 AM
Using Class Variable in Def	Agama	Development	1	08-21-2012 07:05 AM
Questions About def get_browser(self)	Finbar127	Recipes	6	02-24-2011 09:36 PM
Displaying images on e-ink screens	Ea	Workshop	4	06-25-2008 10:18 AM
Stop squinting - eyestrain problems caused by small screens	Alexander Turcic	News	9	04-27-2006 03:29 AM

11-12-2014, 01:17 PM	#2
Toxaris Wizard Posts: 4,520 Karma: 121692313 Join Date: Oct 2009 Location: Heemskerk, NL Device: PRS-T1, Kobo Touch, Kobo Aura	Well, the first thing you can do is stop sending PDF to your conversion house. That is the most crappy format to start from... With regards to your image problem, that is just the current situation. Downsampling is really depending on the algorithms and for a lot of readers they are not great. Also, high resolution is fine, but there are millions of readers that do not support those resolutions. So, there is a problem there. Some publishers make two versions. One with a high resolution and one with a lower one. If you have full screen images, you could use a SVG wrapper. Downscaling of those are usually acceptable.

11-12-2014, 03:39 PM	#5
MaudlinHaus Junior Member Posts: 3 Karma: 10 Join Date: Mar 2013 Device: iPad	So SVG wrapper affects how a file is downsampled? And the downsampling is smoother than if the size is just called out using html/CSS? What about support across readers/devices--are there known (mainstream) readers that definitely don't support SVG? RE: PDF, I would guess that most presses send PDF for conversion. What format would you send instead?

11-14-2014, 05:41 PM	#7
MaudlinHaus Junior Member Posts: 3 Karma: 10 Join Date: Mar 2013 Device: iPad	You make some good points, but you make a bunch of incorrect statements about PDF. If you generate a PDF from InDesign, you get fully searchable, highlightable, copyable text--OCR is not in the picture at all. And if you export from inDesign with no downsampling, a 3-inch, 300dpi image in indesign is a 3-inch, 300dpi image in PDF--there's no loss there. And as I said, from the PDF source file, using CSS/html to squish 2x pixels into a given screen pixel space (a 400 px wide image gets screen pixel width="200" for example), we get high quality results on the ipad screen (I'm a little afraid of what is going on on the various HD Kindles, but I like the iPad better as a high-density screen standard.) The issue I'm seeing, as far as I know I can tell is that by allowing the reader software (which is essentially a browser) to resize with HTML/CSS, you can end up tasking the software with rerendering a bunch of pixels into a really small space, so whereas I'm squishing a 400 px image into 200 screen pixels in iBooks and that looks good because the screen resolution is pretty high, it ends up looking pretty bad in ADE on an older monitor due to extreme downsampling and lower pixel per inch. The more I think about this, the more I'm realizing how old the monitors we use at work are. I think monitors even slightly older have better output and are hopefully not showing these problems to the reader. Also, I checked the same epub in ADE and Readium, and the downsampling in Readium is much smoother. It's a shame that the niceties that people have come to expect from the browsers have not hit all of the ereader platforms. Yet another reason we need to just be able to open the books in the damn browser and call it a day :/