View Single Post
Old 01-13-2011, 05:23 PM   #13
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 3,720
Karma: 1759970
Join Date: Sep 2010
Device: none
Quote:
Originally Posted by theducks View Post
My un-wrap line Regex

([\w",])</p>\s+<p class="calibre2">([\w"“…])

\1 \2

Letters Commas, (curly) Quotes

Not Perfect
Code:
ask
 Samuel if
This should not catch a chapter heading, but it might get (I am not a writer , ) stuff that is in between the heading and first paragraph.
i have fixed up several more books & finally realised that all I should be testing is whether a "line" ends as a well formed sentence i.e. with a full stop, a quote, or an exclamation mark.
anything that does not should not be followed by a </p>
previously I'd been looking for lines that began mid sentence i.e. that began with a lower case letter but really there is no need to test 1st character of next line, just test the previous "line" end - to determine if it is a true "end"

so I am now getting good results with this
find
([Ia-z,])</p>\s*<p>
replace with\1 plus a single space

which bypasses the calibre tags issue.

. I could expand the range to test for for digits / capitalized words but have not yet needed to.
cybmole is offline   Reply With Quote