View Single Post
Old 01-24-2012, 05:27 PM   #3
Serpentine
Evangelist
Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.Serpentine ought to be getting tired of karma fortunes by now.
 
Posts: 416
Karma: 1045911
Join Date: Sep 2011
Location: Cape Town, South Africa
Device: Kindle 3
Might be a bit overkill:

If you want to find paragraphs which might be incorrectly split, here's what I've come up with - it needs a little tweak sometimes, but generally rather good. I wouldn't recommend replacing everything, unless you grep first for results (think I have an alternative with span/[bsiu]'s ignored somewhere... mmm).

Code:
(?smi)(?<=[^[:punct:]])</p>\s*<p[^<>]*>(?=[\.-?])|</p>\s*<p[^<>]*>(?!\s*(<[sbui]>|[[:punct:]\s])+[[:upper:]])(?=[[:punct:]\s]+[[:lower:]])|</p>\s*<p[^<>]*>((?=[ \.>]{2,}([[:punct:]]|[[:lower:]]))|(?=[[:lower:]]))|(?<=,)</p>\s*<p[^<>]*>
Replace with a space character, else it will join the end words.
Serpentine is offline   Reply With Quote