View Single Post
Old 05-29-2015, 07:30 AM   #3
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,705
Karma: 205039118
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by phossler View Post
I have a bunch of entries with a Hx followed by the rest of the chapter title in <p> tags

Code:
<h1>Chapter 1:*</h1>
<p>The Title</p>
What I'd like is

Code:
<h1>Chapter 1: The Title</h1>
My Find was *</h1>\s*<p>(.*?)</p>

My Replace was \1</h1>

but instead of just the text up to the first </p>, all the text is selected

What am I doing wrong?
You're including the all of the <p> </p> in your find expression. That's why it's included: <p>(.*?)</p>

I'd suggest capturing the chapter number and the p contents with something like:
Code:
<h1>Chapter (\d+):</h1>\s+<p>([^>]*)</p>
Then replace with something like:
Code:
<h1>Chapter \1: \2</h1>
The * and + tokens in regex are for repetition (* is zero or more, and + is 1 or more). They follow something that may have multiple occurrences. Unless you're actually looking to match an asterisk character (in which case it should be escaped: \*) there's no reason to start an expression with "*". There's no indication of what might be repeating.

Last edited by DiapDealer; 05-29-2015 at 07:42 AM.
DiapDealer is offline   Reply With Quote