Quote:
Originally Posted by theducks
My un-wrap line Regex
([\w",])</p>\s+<p class="calibre2">([\w"“…])
\1 \2
Letters Commas, (curly) Quotes
Not Perfect
This should not catch a chapter heading, but it might get (I am not a writer  , ) stuff that is in between the heading and first paragraph.
|
i have fixed up several more books & finally realised that all I should be testing is whether a "line" ends as a well formed sentence i.e. with a full stop, a quote, or an exclamation mark.
anything that does not should not be followed by a </p>
previously I'd been looking for lines that began mid sentence i.e. that began with a lower case letter but really there is no need to test 1st character of next line, just test the previous "line" end - to determine if it is a true "end"
so I am now getting good results with this
find
([Ia-z,])</p>\s*<p>
replace with\1 plus a single space
which bypasses the calibre tags issue.
. I could expand the range to test for for digits / capitalized words but have not yet needed to.