Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-29-2012, 11:29 PM   #1
XayneP_G
Junior Member
XayneP_G began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2012
Device: Kobo Touch
PDF to EPUB conversion

This problem has probably been addressed elsewhere, I suspect it has an easy solution, however I have very little experience with coding and at this point im stuck.

I used calibre to convert a PDF file to EPUB. The resulting file had paragraph breaks (<P>) where each line of text ended on the PDF. This means a lot of blank lines through the ebook I was trying to read.

I found I was able to delete the lines manually with Sigil, however it would be a very time consuming process to go through the entire text. As the superfluous paragraph breaks are indistinguishable from the genuine ones, a simple find and replace in the code is not an option either. Is there an easy solution to this problem?
XayneP_G is offline   Reply With Quote
Old 07-30-2012, 02:12 AM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,622
Karma: 5628865
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by XayneP_G View Post
This problem has probably been addressed elsewhere, I suspect it has an easy solution, however I have very little experience with coding and at this point im stuck.

I used calibre to convert a PDF file to EPUB. The resulting file had paragraph breaks (<P>) where each line of text ended on the PDF. This means a lot of blank lines through the ebook I was trying to read.

I found I was able to delete the lines manually with Sigil, however it would be a very time consuming process to go through the entire text. As the superfluous paragraph breaks are indistinguishable from the genuine ones, a simple find and replace in the code is not an option either. Is there an easy solution to this problem?
REGEX find and replace in CV.

Think, what is the common pattern that distinguishes most false line ends?
lower case Letters or a comma with the next line starting in lower case (not perfect: Quotes and proper names (capitals) will be ignored)

search: (?sm)([a-z,])</p>\s+<p .+>([a-z])
replace: \1 \2
theducks is online now   Reply With Quote
 
Enthusiast
Old 07-30-2012, 03:58 AM   #3
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 6,146
Karma: 4792399
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Use a different converter, or different calibre settings, that does a better job at detecting paragraphs.
Jellby is online now   Reply With Quote
Old 07-30-2012, 10:43 AM   #4
PeterT
Taking a break; Fed up
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 6,766
Karma: 43922916
Join Date: Nov 2007
Location: Toronto
Device: Wife: Touch, Arc, Vox Me: Nexus 7, Glo
@XayneP_G: Look at the line un-wrap setting in the Heuristic Processing options on the PDF conversion. Changing that might help with paragraph detection.
PeterT is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
conversion from pdf to epub help slushbilly Workshop 1 01-31-2011 08:07 AM
pdf -> epub conversion cristobalmx Calibre 1 12-12-2010 04:06 AM
PDF to EPUB Conversion LuchoResto General Discussions 1 11-19-2010 04:54 PM
pdf to epub conversion Storyowner Calibre 3 11-03-2010 08:01 AM
Help with conversion from PDF to EPUB Fizz Calibre 5 10-25-2009 11:48 AM


All times are GMT -4. The time now is 09:56 AM.


MobileRead.com is a privately owned, operated and funded community.