View Full Version : Convert PDF to epub?


virtual_ink
08-23-2011, 03:15 AM
I have a few books only available to me as PDFs (and Quark files - but I don't have Quark).

Any recommended tutorials for converting PDFs to epubs?

Toxaris
08-23-2011, 04:30 AM
Converting PDF to ePUB is cumbersome to say the least. Your results are depending on the PDF of course. Sometimes Calibre gives reasonable results, as does some other tools. No tool gives a perfect conversion.

You could try to run the PDF through ABBYY, but that also would result in handwork.

Common issues:
- wrong OCR
- paragraphs at the wrong location or not at all
- totally wrong layout
- missing text
- headers and footers (including pagenumbers) throughout the text

mrmikel
08-23-2011, 06:51 AM
You can add duplicated caption text to the list of possible errors.

Some people use Mobipocket Creator and feed its html output into Calibre or Sigil.

Whatever you do, plan on some work.

BTW these only work if the PDFs have actual text and are not just containers for scans of the page. PDFs containing just images will have to be OCRed and the resulting product, often a mess, cleaned up. Then you get to appreciate that a 2% error rate means an error on every page times the number of pages to correct.

Toxaris
08-23-2011, 08:02 AM
Hmm, I have a less than 2% error rate with OCR, depending on the source. Sometimes even closer to 0,2%.

Jellby
08-23-2011, 10:35 AM
An error of 2% means 1 error every 50... what? Every 50 characters? That's unacceptable. Every 50 pages? That's too good to be true. Every 50 lines? That could be.

Adjust
08-23-2011, 06:17 PM
I have a few books only available to me as PDFs (and Quark files - but I don't have Quark).

Any recommended tutorials for converting PDFs to epubs?

For PDFs. I do a save as to a word file in acrobat 8+ any lower version and forget it, the result is terrible. And bring it back into InDesign and do a basic format applying styles etc.

You do have to look for words with hard hypens like- this and paragraphs which start with lowercase letters, but GREP search in ID will get almost all of them, and it doesn't take long to go through one by one.

Markzware make a Quark to ID converter which is a must, I use this to convert quark documents to ID.

http://markzware.com/products/q2id/

wannabee
08-26-2011, 01:39 AM
I just double click quark files and they open in indesign with no plugin

Thasaidon
08-30-2011, 06:15 AM
I have had reasonable results using Nitro PDF Viewer (freeware) to export the text in the pdf to a text file. I then coverted the file to html and then used Calibre to convert the html to epub.

Toxaris
08-30-2011, 06:56 AM
Convert to a text file? How about the layout and characteristics like italic? You will lose those...

Adjust
08-30-2011, 07:37 AM
Saving to a word file from Acrobat retains most of the formatting

Thasaidon
08-30-2011, 07:48 AM
Convert to a text file? How about the layout and characteristics like italic? You will lose those...

The text file produced by Nitro does retain most of the formatting layout (without using text boxes). I did not get one great splurge of unformatted text.

With regard to italic, I cannot remember if detected it as most of the items I converted contained little or no italic characters.

I anly use this method when other methods do not produce a satisfactory result

Try it out and see what you think. I have really found the free Nitro PDF reader to be very good and I use it in preference to the Adobe offereings.

DiapDealer
08-30-2011, 08:30 AM
I crop with Adobe Acrobat, then export as html and proceed to regex the piss out of it. I've also had decent luck with PDFMasher, but then you need to add images and a lot of formatting back in (and it's still pretty experimental). I think it's always going to be a fairly hands on affair to convert PDF's.