View Single Post
Old 07-30-2012, 02:12 AM   #2
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 30,993
Karma: 60358908
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by XayneP_G View Post
This problem has probably been addressed elsewhere, I suspect it has an easy solution, however I have very little experience with coding and at this point im stuck.

I used calibre to convert a PDF file to EPUB. The resulting file had paragraph breaks (<P>) where each line of text ended on the PDF. This means a lot of blank lines through the ebook I was trying to read.

I found I was able to delete the lines manually with Sigil, however it would be a very time consuming process to go through the entire text. As the superfluous paragraph breaks are indistinguishable from the genuine ones, a simple find and replace in the code is not an option either. Is there an easy solution to this problem?
REGEX find and replace in CV.

Think, what is the common pattern that distinguishes most false line ends?
lower case Letters or a comma with the next line starting in lower case (not perfect: Quotes and proper names (capitals) will be ignored)

search: (?sm)([a-z,])</p>\s+<p .+>([a-z])
replace: \1 \2
theducks is offline   Reply With Quote