MobileRead Forums - View Single Post

ardeegee · 11-02-2009, 03:25 PM

I'm sure this has been covered before, but I'm not sure what search terms I'd use to dig it up.

Scans of books made by Google, Microsoft, and others seem to be in multiple layers per page, with at least 2 layers that I can see-- one of mostly just text (and other areas of black) and one of mostly just every other color. Done, I assume, to make OCRing easier. When you open a PDF file, you can often see the background layer appear before the text/black is overlaid on it. Which is fine, when you are just viewing it as a PDF.

But how do you convert that to other formats and have it look correct? I have a series of public-domain booklets that I gathered together several months ago from various archives (some on Google, some the Internet archives, etc.) They work as PDFs on my Sony Reader, but page changes are slooooooow. So I wanted to convert them all to EPUBs (for myself and to share.) Calibre failed utterly in converting them correctly. I have a full retail copy of an ancient version of Acrobat (version 5) which allows full editing of PDFs, but-- on this old version, at least, when I export the pages they are exporting as their individual layers, with the "black" layer garbled and useless.

Anyone have a solution? These are the files:

http://www.sendspace.com/file/okec99

11-02-2009, 03:25 PM	#1
ardeegee Maratus speciosus butt Posts: 3,292 Karma: 1162698 Join Date: Sep 2009 Device: PRS-350	PDF conversion help I'm sure this has been covered before, but I'm not sure what search terms I'd use to dig it up. Scans of books made by Google, Microsoft, and others seem to be in multiple layers per page, with at least 2 layers that I can see-- one of mostly just text (and other areas of black) and one of mostly just every other color. Done, I assume, to make OCRing easier. When you open a PDF file, you can often see the background layer appear before the text/black is overlaid on it. Which is fine, when you are just viewing it as a PDF. But how do you convert that to other formats and have it look correct? I have a series of public-domain booklets that I gathered together several months ago from various archives (some on Google, some the Internet archives, etc.) They work as PDFs on my Sony Reader, but page changes are slooooooow. So I wanted to convert them all to EPUBs (for myself and to share.) Calibre failed utterly in converting them correctly. I have a full retail copy of an ancient version of Acrobat (version 5) which allows full editing of PDFs, but-- on this old version, at least, when I export the pages they are exporting as their individual layers, with the "black" layer garbled and useless. Anyone have a solution? These are the files: http://www.sendspace.com/file/okec99