View Single Post
Old 03-29-2012, 07:34 PM   #5
prcek
Junior Member
prcek began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Mar 2012
Device: iPad3,Kindle DX+Fire
I finally figured out more about what's going on. It appears that some of these books have quite a few pages where DCT and deflate images sit next to (or maybe even overlay) each other to provide shading around the DCT/jpeg pictures, and when these deflate images are saved as PNG files the resulting xml/html is then severely messed up. I've tried every PDF->html/xml tool I could find, and they all seem to have more or less the same problem - the result might be mangled slightly differently, but basically it’s unusable in all the cases. My “fix” in pdftohtml worked only because it completely ignored all of these little deflate / PNG images (as the call to OutputDev::drawImage for them turns out to be a no-op). The obvious question now is - what is the easiest way to fix this? I know nothing about the layout code so I’ve no clue how easy/hard it might be to glue these images together correctly (i.e. the way they’re supposed to be shown – and do show in Acrobat) or even somehow ignore them automatically.

Any ideas?

BTW, if it would help to have a sample page (that exhibits the problem) to look at, just let me know how / where to post it – it’s trivial to extract just one sample page into a tiny PDF.

Thanks
PeterK
prcek is offline   Reply With Quote