View Single Post
Old 06-21-2010, 01:50 AM   #2
kiwikobo
Enthusiast
kiwikobo doesn't litterkiwikobo doesn't litter
 
Posts: 47
Karma: 120
Join Date: Jun 2010
Device: Kobo
Conversion of PDF to Epub is very flaky, and I think it would pay to remember that Calibre is showing at version 0.7, very much pre-beta. The fact that it works at all is a great benefit!

While we're waiting for conversions that hold up some of the formatting, manually editing (or rather, wholesale replacement) of the stylesheet is needed. But I think you're referring to the common problem Calibre has with taking PDF files that have hard line-breaks and creating an output Epub file where every line is its own paragraph. Very annoying.

One solution I use is to use Calibre to convert the PDF into an RTF file, then edit this RTF and remove the hard line-breaks. If you are comfortable with regular expressions this isn't too hard, but even in Word it's possible (you're trying to replace every paragraph starting with a lower-case letter with a space, so do Edit/Replace then change every instance of "^p^ta" with " a" (note the space) selecting "Match Case". Do the same with "^p^tb" to " b" and so on through to "^p^tz" with " z". Now remove all multiple spaces (replace "[space][space]" with "[space]"), do any other fiddling you wish to do (e.g. removing tabs or whatever), and save the document.

Go back into Calibre and convert the RTF file into an ePub and it should be a lot closer. It sounds a pain but only takes a couple of minutes. With a regular expression editor it's even quicker.


Charles
kiwikobo is offline   Reply With Quote