Order it now! Amazon prioritizes orders on a first come, first served basis.


View Full Version : PDF conversion help


ardeegee
11-02-2009, 04:25 PM
I'm sure this has been covered before, but I'm not sure what search terms I'd use to dig it up.

Scans of books made by Google, Microsoft, and others seem to be in multiple layers per page, with at least 2 layers that I can see-- one of mostly just text (and other areas of black) and one of mostly just every other color. Done, I assume, to make OCRing easier. When you open a PDF file, you can often see the background layer appear before the text/black is overlaid on it. Which is fine, when you are just viewing it as a PDF.

But how do you convert that to other formats and have it look correct? I have a series of public-domain booklets that I gathered together several months ago from various archives (some on Google, some the Internet archives, etc.) They work as PDFs on my Sony Reader, but page changes are slooooooow. So I wanted to convert them all to EPUBs (for myself and to share.) Calibre failed utterly in converting them correctly. I have a full retail copy of an ancient version of Acrobat (version 5) which allows full editing of PDFs, but-- on this old version, at least, when I export the pages they are exporting as their individual layers, with the "black" layer garbled and useless.

Anyone have a solution? These are the files:

http://www.sendspace.com/file/okec99

nrapallo
11-02-2009, 05:14 PM
Though a bit cumbersome, you can use PDFRead 1.8 (http://www.mobileread.com/forums/showthread.php?p=159387) to convert those scanned .pdf ebooks into either the Sony .lrf format or Mobipocket .prc format (and then use calibre to convert the .prc to .epub).

There is no .epub support within PDFRead, but it can also be made to retain the .html and images used to create the resulting ebooks (using an empty file named "debug" in the PDFRead install directory) and thus that could easily be used with calibre or Sigil to yield a .epub ebook!

See the attached samples from "Japanese Fairy Tale Series 01 #01- Momotaro.pdf" noting that the .lrf and .prc were created by PDFRead using two successive "runs" and the .epub was created using calibre from that .prc.

It seems PDFRead is a good fit. Have fun converting all of them!

ardeegee
11-03-2009, 02:55 AM
Spent a long time playing with PDFRead, but couldn't find a way to make the images use the full vertical space (and be centered on the horizontal) so I ended up having to export them as individual images, archive them as CBRs, and use comiclrf to make them into LRFs. The LRFs mostly look okay, considering the source material. EPUBS, on the other hand, ended up looking horrible for all of them. But here are the LRFs I made.

http://www.sendspace.com/file/zy6qaj