View Full Version : PDF conversion help

11-02-2009, 04:25 PM
I'm sure this has been covered before, but I'm not sure what search terms I'd use to dig it up.

Scans of books made by Google, Microsoft, and others seem to be in multiple layers per page, with at least 2 layers that I can see-- one of mostly just text (and other areas of black) and one of mostly just every other color. Done, I assume, to make OCRing easier. When you open a PDF file, you can often see the background layer appear before the text/black is overlaid on it. Which is fine, when you are just viewing it as a PDF.

But how do you convert that to other formats and have it look correct? I have a series of public-domain booklets that I gathered together several months ago from various archives (some on Google, some the Internet archives, etc.) They work as PDFs on my Sony Reader, but page changes are slooooooow. So I wanted to convert them all to EPUBs (for myself and to share.) Calibre failed utterly in converting them correctly. I have a full retail copy of an ancient version of Acrobat (version 5) which allows full editing of PDFs, but-- on this old version, at least, when I export the pages they are exporting as their individual layers, with the "black" layer garbled and useless.

Anyone have a solution? These are the files:

11-02-2009, 05:14 PM
Though a bit cumbersome, you can use PDFRead 1.8 ( to convert those scanned .pdf ebooks into either the Sony .lrf format or Mobipocket .prc format (and then use calibre to convert the .prc to .epub).

There is no .epub support within PDFRead, but it can also be made to retain the .html and images used to create the resulting ebooks (using an empty file named "debug" in the PDFRead install directory) and thus that could easily be used with calibre or Sigil to yield a .epub ebook!

See the attached samples from "Japanese Fairy Tale Series 01 #01- Momotaro.pdf" noting that the .lrf and .prc were created by PDFRead using two successive "runs" and the .epub was created using calibre from that .prc.

It seems PDFRead is a good fit. Have fun converting all of them!

11-03-2009, 02:55 AM
Spent a long time playing with PDFRead, but couldn't find a way to make the images use the full vertical space (and be centered on the horizontal) so I ended up having to export them as individual images, archive them as CBRs, and use comiclrf to make them into LRFs. The LRFs mostly look okay, considering the source material. EPUBS, on the other hand, ended up looking horrible for all of them. But here are the LRFs I made.

01-12-2010, 07:59 AM
actually what u could do is use rasterfarian. have a look for manga2ebook and it will put u on its trail. what it does is, manga2ebook can be used to convert cbr/cbz to pdf, though it will mess up the sequence, so calibre does a better job. that pdf is a scanned pdf. rasterfairan is a separate tool, which will rasterize and CONDENSE a pdf. i.e. on a 1.5 GB color pdf from a comicbook, u may happen to be able to obtain a lrf which is abt 200 MB, which is dramatic. speed should be ok then. but the results are not always the same. if u have a BW pdf which is almost without shades already, reduction maybe will only achieve 50% reduction.

01-13-2010, 02:22 PM
I find that PDFLRF and soPDF work better than either PDFread (sorry Nick) or Rasterfarian for this job -- I'd recommend PDFLRF in particular for this job.

I should say, Google already offers most of their public domain titles in both ePub and PDF format, though the ePubs are not that great.

Not sure why this thread isn't in the PDF forum.

01-13-2010, 03:47 PM
I find that PDFLRF and soPDF work better than either PDFread (sorry Nick) or Rasterfarian for this job -- I'd recommend PDFLRF in particular for this job.

Oh, no offence taken! I'm always one to use the BEST tool at hand, and if PDFLRF does the better job, then so be it!

Thanks for recommending it based on your practical ("real world") experience. :)