Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-29-2012, 11:29 PM   #1
XayneP_G
Junior Member
XayneP_G began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jul 2012
Device: Kobo Touch
PDF to EPUB conversion

This problem has probably been addressed elsewhere, I suspect it has an easy solution, however I have very little experience with coding and at this point im stuck.

I used calibre to convert a PDF file to EPUB. The resulting file had paragraph breaks (<P>) where each line of text ended on the PDF. This means a lot of blank lines through the ebook I was trying to read.

I found I was able to delete the lines manually with Sigil, however it would be a very time consuming process to go through the entire text. As the superfluous paragraph breaks are indistinguishable from the genuine ones, a simple find and replace in the code is not an option either. Is there an easy solution to this problem?
XayneP_G is offline   Reply With Quote
Old 07-30-2012, 02:12 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,763
Karma: 54401244
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by XayneP_G View Post
This problem has probably been addressed elsewhere, I suspect it has an easy solution, however I have very little experience with coding and at this point im stuck.

I used calibre to convert a PDF file to EPUB. The resulting file had paragraph breaks (<P>) where each line of text ended on the PDF. This means a lot of blank lines through the ebook I was trying to read.

I found I was able to delete the lines manually with Sigil, however it would be a very time consuming process to go through the entire text. As the superfluous paragraph breaks are indistinguishable from the genuine ones, a simple find and replace in the code is not an option either. Is there an easy solution to this problem?
REGEX find and replace in CV.

Think, what is the common pattern that distinguishes most false line ends?
lower case Letters or a comma with the next line starting in lower case (not perfect: Quotes and proper names (capitals) will be ignored)

search: (?sm)([a-z,])</p>\s+<p .+>([a-z])
replace: \1 \2
theducks is offline   Reply With Quote
Advert
Old 07-30-2012, 03:58 AM   #3
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Use a different converter, or different calibre settings, that does a better job at detecting paragraphs.
Jellby is offline   Reply With Quote
Old 07-30-2012, 10:43 AM   #4
PeterT
Grand Sorcerer
PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.PeterT ought to be getting tired of karma fortunes by now.
 
PeterT's Avatar
 
Posts: 12,145
Karma: 73448616
Join Date: Nov 2007
Location: Toronto
Device: Nexus 7, Clara, Touch, Tolino EPOS
@XayneP_G: Look at the line un-wrap setting in the Heuristic Processing options on the PDF conversion. Changing that might help with paragraph detection.
PeterT is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
conversion from pdf to epub help slushbilly Workshop 1 01-31-2011 08:07 AM
pdf -> epub conversion cristobalmx Calibre 1 12-12-2010 04:06 AM
PDF to EPUB Conversion LuchoResto General Discussions 1 11-19-2010 04:54 PM
pdf to epub conversion Storyowner Calibre 3 11-03-2010 08:01 AM
Help with conversion from PDF to EPUB Fizz Calibre 5 10-25-2009 11:48 AM


All times are GMT -4. The time now is 07:24 PM.


MobileRead.com is a privately owned, operated and funded community.