Quote:
Originally Posted by mshellberg
When I'm joining split sentences like that, I search for a lowercase letter after the <p> tags...
search: </p>\s*<p>([a-z])
replace: _\1
(Note the space before \1.)
Hope that helps.
|
It helps but I think I have to do like huebi suggested and do several sweeps. looking for different things each time because I found not all sentence splits end with a lowercase letter. Some ended with quotes but no period, some with a comma, some with a question/exclamation mark but no quotes because the person speaking was still speaking but it was continued in another paragraph.
Basically I was trying to catch that all in one go. Looking at other examples I think I have too many commas to separate stuff that don't need it.
[^.^\?^\!][a-zA-Z”\,\?\!+]</p>
Should find: ?</p> but not ?”</p> or z”</p> but not z.”</p>
Just tested this with the following
BOLD found ITALICS skipped:
<p>x?</p>
<p>x.”</p>
<p>x?</p>
<p>X!</p>
<p>x!”</p>
<p>x,</p>
<p>x,”</p>
<p>x”</p>
EDIT: The above can be simplified further: [^.?!][a-zA-Z”,?!]</p>
So basically we now have: if any of those 3 characters [^.?!] appear before any of these characters [a-zA-Z”,?!] (specifically the”) & </p> then skip that find.