MobileRead Forums - View Single Post

cybmole · 01-13-2011, 05:23 PM

Quote:

Originally Posted by theducks

My un-wrap line Regex

([\w",])\s+([\w"“…])

\1 \2

Letters Commas, (curly) Quotes

Not Perfect

Code:

ask
 Samuel if

This should not catch a chapter heading, but it might get (I am not a writer

, ) stuff that is in between the heading and first paragraph.

i have fixed up several more books & finally realised that all I should be testing is whether a "line" ends as a well formed sentence i.e. with a full stop, a quote, or an exclamation mark.
anything that does not should not be followed by a 
previously I'd been looking for lines that began mid sentence i.e. that began with a lower case letter but really there is no need to test 1st character of next line, just test the previous "line" end - to determine if it is a true "end"

so I am now getting good results with this
find
([Ia-z,])\s*
replace with\1 plus a single space

which bypasses the calibre tags issue.

. I could expand the range to test for for digits / capitalized words but have not yet needed to.