View Single Post
Old 01-21-2010, 01:30 PM   #11
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,475
Karma: 8025702
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
I have had good luck converting single-column PDFs by:
1. Crop margins of the PDF to the text area.
2. Save as RTF
3. Use Calibre to convert RTF to EPUB
4. Use Sigil to fix line breaks across page breaks (and a few others)

This method conserves all the character formatting, and (with one exception) the 10 or so files I have converted hasn't resulted in a large amount of repair work.

The one case involved fixing up chapter headings. The original document had chapters in the form
ROMAN_NUMERAL.
CHAPTER TITLE
followed by a few empty lines. The easiest way for me to fix this was to use VIM's global search & replace on the .html that came out of the epub conversion. I used a regexp that matched the two lines and replaced them with a single line wrapped in <h1></h1> tags. I tried using sigil, but couldn't figure out how to make a multi-line match expression (I admit I didn't look for a long time).
chaley is offline   Reply With Quote