07-29-2012, 11:29 PM | #1 |
Junior Member
Posts: 1
Karma: 10
Join Date: Jul 2012
Device: Kobo Touch
|
PDF to EPUB conversion
This problem has probably been addressed elsewhere, I suspect it has an easy solution, however I have very little experience with coding and at this point im stuck.
I used calibre to convert a PDF file to EPUB. The resulting file had paragraph breaks (<P>) where each line of text ended on the PDF. This means a lot of blank lines through the ebook I was trying to read. I found I was able to delete the lines manually with Sigil, however it would be a very time consuming process to go through the entire text. As the superfluous paragraph breaks are indistinguishable from the genuine ones, a simple find and replace in the code is not an option either. Is there an easy solution to this problem? |
07-30-2012, 02:12 AM | #2 | |
Well trained by Cats
Posts: 29,785
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
Think, what is the common pattern that distinguishes most false line ends? lower case Letters or a comma with the next line starting in lower case (not perfect: Quotes and proper names (capitals) will be ignored) search: (?sm)([a-z,])</p>\s+<p .+>([a-z]) replace: \1 \2 |
|
07-30-2012, 03:58 AM | #3 |
frumious Bandersnatch
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|
Use a different converter, or different calibre settings, that does a better job at detecting paragraphs.
|
07-30-2012, 10:43 AM | #4 |
Grand Sorcerer
Posts: 12,160
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
|
@XayneP_G: Look at the line un-wrap setting in the Heuristic Processing options on the PDF conversion. Changing that might help with paragraph detection.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
conversion from pdf to epub help | slushbilly | Workshop | 1 | 01-31-2011 08:07 AM |
pdf -> epub conversion | cristobalmx | Calibre | 1 | 12-12-2010 04:06 AM |
PDF to EPUB Conversion | LuchoResto | General Discussions | 1 | 11-19-2010 04:54 PM |
pdf to epub conversion | Storyowner | Calibre | 3 | 11-03-2010 08:01 AM |
Help with conversion from PDF to EPUB | Fizz | Calibre | 5 | 10-25-2009 11:48 AM |