|06-01-2010, 09:46 PM||#1|
Join Date: May 2010
Device: Kobo (loaned out), KoboTouch (given away), KoboVox, KoboGlo, AuraHD
removing hard line endings
I am sorry if this has already been answered somewhere else. I am using Calibre to convert pdf to epub and am unable to find an easy way to deal with the hard line endings. Removing them all manually is not really an option as the documents are about 500 pages long (and I don't know how to do it anyway) . Any suggestions would be great.
|06-01-2010, 10:32 PM||#2|
Join Date: Jan 2010
Device: Nexus One
I'm not sure it's fully possible in calibre. There is the pdf line-unwrap option, but if it's not doing it for you, there's no way in calibre to manually edit them.
I did do it once for a problematic pdf by converting it to rtf, then opening it in office.
Using regex in the search and replace box there, you can find all VALID paragraph breaks, temporarily replace them with a placeholder code (12345 or something), then replace all remaining paragraph breaks with a space, then re-replace the placeholder with paragraph breaks.
It's been a while, but if I recall, I decided that valid paragraph breaks were those that were preceded by a period, question mark, exclamation point, or quotation mark.
You may end up with a few mistakes using this method, but by and large, it unwrapped the hard line endings from the pdf.
Afterwards, just add the rtf to calibre and continue your conversion process.
While you're busy editing the rtf, you can go ahead and make sure chapter headings are using the 'heading' style, too, so calibre will correctly find them.
|06-01-2010, 11:18 PM||#3|
Join Date: Sep 2009
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O/GloHD
Until Calibre gets its new PDF engine, you might like to take a look at a new utility called "pdfreflow" posted over in the PDF forum.
It converts your PDF to HTML and reflows the paragraphs as best it can from the info you input. You would then import the HTML to calibre for conversion to EPUB.
I have been pleasantly surprised with the results I've had so far, although it's not guaranteed to work with all PDFs.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|Removing unnecessary line breaks in books.||Wintersdark||Calibre||17||09-04-2010 04:34 AM|
|Tool for removing line breaks in text documents||kahn10||Sony Reader||9||08-22-2010 10:05 PM|
|Removing Line-breaks / Preserving Paragraphs||ahi||Workshop||5||06-08-2009 02:22 AM|
|Removing the first line||jethro10||Calibre||2||03-05-2009 12:32 PM|
|Removing extra line breaks||plemming||Calibre||0||07-31-2008 07:50 PM|