|07-29-2012, 11:29 PM||#1|
Join Date: Jul 2012
Device: Kobo Touch
PDF to EPUB conversion
This problem has probably been addressed elsewhere, I suspect it has an easy solution, however I have very little experience with coding and at this point im stuck.
I used calibre to convert a PDF file to EPUB. The resulting file had paragraph breaks (<P>) where each line of text ended on the PDF. This means a lot of blank lines through the ebook I was trying to read.
I found I was able to delete the lines manually with Sigil, however it would be a very time consuming process to go through the entire text. As the superfluous paragraph breaks are indistinguishable from the genuine ones, a simple find and replace in the code is not an option either. Is there an easy solution to this problem?
|07-30-2012, 02:12 AM||#2|
Well trained by Cats
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Astak Pocket Pro, K4NT,Galaxy Tab 2
Think, what is the common pattern that distinguishes most false line ends?
lower case Letters or a comma with the next line starting in lower case (not perfect: Quotes and proper names (capitals) will be ignored)
search: (?sm)([a-z,])</p>\s+<p .+>([a-z])
replace: \1 \2
|07-30-2012, 03:58 AM||#3|
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
|07-30-2012, 10:43 AM||#4|
Join Date: Nov 2007
Device: Wife: Touch, Arc, Vox Me: Nexus 7, Glo
@XayneP_G: Look at the line un-wrap setting in the Heuristic Processing options on the PDF conversion. Changing that might help with paragraph detection.
|Thread Tools||Search this Thread|
|Thread||Thread Starter||Forum||Replies||Last Post|
|conversion from pdf to epub help||slushbilly||Workshop||1||01-31-2011 08:07 AM|
|pdf -> epub conversion||cristobalmx||Calibre||1||12-12-2010 04:06 AM|
|PDF to EPUB Conversion||LuchoResto||General Discussions||1||11-19-2010 04:54 PM|
|pdf to epub conversion||Storyowner||Calibre||3||11-03-2010 08:01 AM|
|Help with conversion from PDF to EPUB||Fizz||Calibre||5||10-25-2009 11:48 AM|