MobileRead Forums - View Single Post - A little help with a regex please, if you don't mind?

timberbeast · 05-07-2014, 02:55 AM

First things first. Thank you very much, Kovid, for your top notch program. Calibre is powerful as heck, and a lot of fun to use. Then, you added the Editor and it is 3 times as valuable, in my opinion.

I have a book is really chopped up. Four lines in the editor for every book page for just headers. But I fixed those using S&R, no problem. The real problem is when I try to remove a bunch of the html tags to clean it up. You can see what I mean:

Code:

<p class="calibre1">Fallon was shaking his head. “Let me tell you what the people in</p>
<p class="calibre1">Washington say is stenciled on that woman’s undies. ‘Virginia Larue’s</p>
<p class="calibre1">Home for Wayward Boys.’ Ginny Larue is a regular one-woman</p>

As you can see, it is just one character I need to not to remove in most of them.

I used this to find them.

Code:

 \w</p>\s<p.\w+..\w+..

I would like to use just a *space* to replace them.

Obviously, I can't use S&R to fix them without hosing my book. Is there any way that I rewrite the regular expression that won't select the last character just before the closing tag?

Thank you.
one of your faithful lurkers,

larry

05-07-2014, 02:55 AM	#1
timberbeast stumblebum Posts: 29 Karma: 10 Join Date: Nov 2013 Location: Roseburg, OR Device: kindle2	A little help with a regex please, if you don't mind? First things first. Thank you very much, Kovid, for your top notch program. Calibre is powerful as heck, and a lot of fun to use. Then, you added the Editor and it is 3 times as valuable, in my opinion. I have a book is really chopped up. Four lines in the editor for every book page for just headers. But I fixed those using S&R, no problem. The real problem is when I try to remove a bunch of the html tags to clean it up. You can see what I mean: Code: <p class="calibre1">Fallon was shaking his head. “Let me tell you what the people in</p> <p class="calibre1">Washington say is stenciled on that woman’s undies. ‘Virginia Larue’s</p> <p class="calibre1">Home for Wayward Boys.’ Ginny Larue is a regular one-woman</p> As you can see, it is just one character I need to not to remove in most of them. I used this to find them. Code: \w</p>\s<p.\w+..\w+.. I would like to use just a space to replace them. Obviously, I can't use S&R to fix them without hosing my book. Is there any way that I rewrite the regular expression that won't select the last character just before the closing tag? Thank you. one of your faithful lurkers, larry