View Single Post
Old 12-27-2010, 11:57 AM   #5
Danger
Evangelist
Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.Danger ought to be getting tired of karma fortunes by now.
 
Danger's Avatar
 
Posts: 490
Karma: 1665031
Join Date: Nov 2010
Location: Vancouver Island, Nanaimo
Device: K2 (retired), Kobo Touch (passed to the wife), KGlo, Galaxy TabPro
Quote:
Originally Posted by mshellberg View Post
When I'm joining split sentences like that, I search for a lowercase letter after the <p> tags...

search: </p>\s*<p>([a-z])
replace: _\1
(Note the space before \1.)

Hope that helps.
It helps but I think I have to do like huebi suggested and do several sweeps. looking for different things each time because I found not all sentence splits end with a lowercase letter. Some ended with quotes but no period, some with a comma, some with a question/exclamation mark but no quotes because the person speaking was still speaking but it was continued in another paragraph.

Basically I was trying to catch that all in one go. Looking at other examples I think I have too many commas to separate stuff that don't need it.

[^.^\?^\!][a-zA-Z”\,\?\!+]</p>

Should find: ?</p> but not ?”</p> or z”</p> but not z.”</p>

Just tested this with the following BOLD found ITALICS skipped:
<p>x?</p>
<p>x.”</p>
<p>x?</p>
<p>X!</p>
<p>x!”</p>
<p>x,</p>
<p>x,”</p>
<p>x”</p>

EDIT: The above can be simplified further: [^.?!][a-zA-Z”,?!]</p>
So basically we now have: if any of those 3 characters [^.?!] appear before any of these characters [a-zA-Z”,?!] (specifically the”) & </p> then skip that find.

Last edited by Danger; 12-27-2010 at 03:57 PM.
Danger is offline   Reply With Quote