Quote:
Originally Posted by XayneP_G
This problem has probably been addressed elsewhere, I suspect it has an easy solution, however I have very little experience with coding and at this point im stuck.
I used calibre to convert a PDF file to EPUB. The resulting file had paragraph breaks (<P>) where each line of text ended on the PDF. This means a lot of blank lines through the ebook I was trying to read.
I found I was able to delete the lines manually with Sigil, however it would be a very time consuming process to go through the entire text. As the superfluous paragraph breaks are indistinguishable from the genuine ones, a simple find and replace in the code is not an option either. Is there an easy solution to this problem?
|
REGEX find and replace in CV.
Think, what is the common pattern that distinguishes most false line ends?
lower case Letters or a comma with the next line starting in lower case (not perfect: Quotes and proper names (capitals) will be ignored)
search: (?sm)([a-z,])</p>\s+<p .+>([a-z])
replace: \1 \2