MobileRead Forums - View Single Post

jackie_w · 07-02-2013, 09:50 AM

Quote:

Originally Posted by BetterRed

jackie_w : I've been trying to think of a reason this might happen - the only thing that came to mind is Progressive versus Baseline jpegs.

If you have a PDF with the problem are you able to extract the recalcitrant image and have a look at its properties - or send me the PDF via a PM and I'll have looksee

I'm afraid I didn't keep the PDFs after I'd created a nice clean epub version. If I come across any in the future I'll PM you.

In case it sheds anymore light, my PDF conversion method is:

Use the pdf2xml.exe utility, which is the basis of the MobiPocket converter, to extract an XML file of the text plus the images. Only some of the images are extracted for some reason.
Use homegrown software to convert XML to simple clean HTML, preserving styling/structure (headings, scenebreaks, dropcaps, dehyphenation etc) normally lost during typical PDF-epub conversions.
Use calibre to convert HTML to epub.

The shredded images were produced in step 1 by the pdf2xml.exe extract process. As this program doesn't seem to have any input parameters, I couldn't experiment further.