MobileRead Forums - View Single Post

chaley · 01-21-2010, 02:30 PM

I have had good luck converting single-column PDFs by:
1. Crop margins of the PDF to the text area.
2. Save as RTF
3. Use Calibre to convert RTF to EPUB
4. Use Sigil to fix line breaks across page breaks (and a few others)

This method conserves all the character formatting, and (with one exception) the 10 or so files I have converted hasn't resulted in a large amount of repair work.

The one case involved fixing up chapter headings. The original document had chapters in the form
ROMAN_NUMERAL.
CHAPTER TITLE
followed by a few empty lines. The easiest way for me to fix this was to use VIM's global search & replace on the .html that came out of the epub conversion. I used a regexp that matched the two lines and replaced them with a single line wrapped in <h1></h1> tags. I tried using sigil, but couldn't figure out how to make a multi-line match expression (I admit I didn't look for a long time).

01-21-2010, 02:30 PM	#11
chaley Grand Sorcerer Posts: 12,529 Karma: 8075744 Join Date: Jan 2010 Location: Notts, England Device: Kobo Libra 2	I have had good luck converting single-column PDFs by: 1. Crop margins of the PDF to the text area. 2. Save as RTF 3. Use Calibre to convert RTF to EPUB 4. Use Sigil to fix line breaks across page breaks (and a few others) This method conserves all the character formatting, and (with one exception) the 10 or so files I have converted hasn't resulted in a large amount of repair work. The one case involved fixing up chapter headings. The original document had chapters in the form ROMAN_NUMERAL. CHAPTER TITLE followed by a few empty lines. The easiest way for me to fix this was to use VIM's global search & replace on the .html that came out of the epub conversion. I used a regexp that matched the two lines and replaced them with a single line wrapped in <h1></h1> tags. I tried using sigil, but couldn't figure out how to make a multi-line match expression (I admit I didn't look for a long time).