MobileRead Forums - View Single Post - PDF -> HTML produces bogus PNG's

prcek · 03-26-2012, 02:40 PM

I'm trying to convert a bunch of PDF's to MOBI and for some reason the conversion to HTML produces a ton of JPG's but also PNG's and those PNG's are almost all bogus (all gray or similar junk). I've tried Calibre and other tools and they all do the same thing. I tried building pdftohtml (the one in poppler) on my Linux box (I gave up on Windows) and found the following code in utils\HtmlOutputDev.cc near line 1200:

void HtmlOutputDev::drawImage(GfxState *state, Object *ref, Stream *str,
int width, int height, GfxImageColorMap *colorMap,
GBool interpolate, int *maskColors, GBool inlineImg) {

...
...

if (dumpJPEG && str->getKind() == strDCT) {
...
...
else {
#ifdef ENABLE_LIBPNG
// Dump the image as a PNG file. Much of the PNG code
// comes from an example by Guillaume Cottenceau.
...
...
#else
OutputDev::drawImage(state, ref, str, width, height, colorMap, interpolate,
maskColors, inlineImg);
#endif
}
}

If I simply disable the LIBPNG section above all of the images come out as JPG's (and they all look OK), but I've no idea whether this is a reasonable workaround and if so how to make a new calibre with this hacked pdftohtml.

Specifically, I use calibre under Win7 and after hours of trying to build poppler on Windows (based on various sets of instructions I found on the web) I've concluded it's currently beyond my ability / patience. It's also not entirely clear to me how to use a custom build of poppler with calibre - that is, let's say some kind soul tells me how to get it built on Win7 (VS2008 or VS2010 would be best but I also have cygwin and mingw and git and who knows what else installed), do I simply copy pdftohtml.exe to the calibre directory or is there more to it than that?

Any pointers would be greatly appreciated - thanks!

PeterK

03-26-2012, 02:40 PM	#1
prcek Junior Member Posts: 4 Karma: 10 Join Date: Mar 2012 Device: iPad3,Kindle DX+Fire	PDF -> HTML produces bogus PNG's I'm trying to convert a bunch of PDF's to MOBI and for some reason the conversion to HTML produces a ton of JPG's but also PNG's and those PNG's are almost all bogus (all gray or similar junk). I've tried Calibre and other tools and they all do the same thing. I tried building pdftohtml (the one in poppler) on my Linux box (I gave up on Windows) and found the following code in utils\HtmlOutputDev.cc near line 1200: void HtmlOutputDev::drawImage(GfxState state, Object ref, Stream str, int width, int height, GfxImageColorMap colorMap, GBool interpolate, int *maskColors, GBool inlineImg) { ... ... if (dumpJPEG && str->getKind() == strDCT) { ... ... else { #ifdef ENABLE_LIBPNG // Dump the image as a PNG file. Much of the PNG code // comes from an example by Guillaume Cottenceau. ... ... #else OutputDev::drawImage(state, ref, str, width, height, colorMap, interpolate, maskColors, inlineImg); #endif } } If I simply disable the LIBPNG section above all of the images come out as JPG's (and they all look OK), but I've no idea whether this is a reasonable workaround and if so how to make a new calibre with this hacked pdftohtml. Specifically, I use calibre under Win7 and after hours of trying to build poppler on Windows (based on various sets of instructions I found on the web) I've concluded it's currently beyond my ability / patience. It's also not entirely clear to me how to use a custom build of poppler with calibre - that is, let's say some kind soul tells me how to get it built on Win7 (VS2008 or VS2010 would be best but I also have cygwin and mingw and git and who knows what else installed), do I simply copy pdftohtml.exe to the calibre directory or is there more to it than that? Any pointers would be greatly appreciated - thanks! PeterK