MobileRead Forums - View Single Post - Regular expression for matching div tags?

kiwidude · 12-11-2010, 06:09 AM

Hi all,

Spent way too much time on this without success so hopefully a regex guru can help me.

I have an xhtml document in Sigil that has a lot of nasty formatting that I want to remove. Specifically it has a series of <div> tags surrounding sets of paragraphs.

I have been trying to do a find/replace and the issue I have is trying to do a "non-greedy" match. The text looks like the following (it does not nest div tags):

Code:

 <div class="s4">
    <p class="calibre4">Blah blah</p>
    <p class="calibre4">Blah blah</p>
    <p class="calibre4">Blah blah</p>
 </div>
 <div class="s6">
    <p class="calibre4">Blah blah</p>
 </div>
 <div class="s4">
    <p class="calibre4">Blah blah</p>
 </div>

Now let's say I am only interested in selecting the <div class="s4"> blocks and stripping their outer div tags.

What regex should I use? I've looked into negative lookups as well as non-greedy matches but my head hurts from lack of success. At it's simplest I had hoped I could use something like:
Find: <div class="s4">(.*?)</div>
Replace: \1

However that doesn't work. Could someone please suggest something? Worst case I will just remove the class from the div tags so it does nothing but it has now reached the point of insulting my pride if I let it completely beat me

12-11-2010, 06:09 AM	#1
kiwidude Calibre Plugins Developer Posts: 4,745 Karma: 2208556 Join Date: Oct 2010 Location: Australia Device: Kindle Oasis	Regular expression for matching div tags? Hi all, Spent way too much time on this without success so hopefully a regex guru can help me. I have an xhtml document in Sigil that has a lot of nasty formatting that I want to remove. Specifically it has a series of <div> tags surrounding sets of paragraphs. I have been trying to do a find/replace and the issue I have is trying to do a "non-greedy" match. The text looks like the following (it does not nest div tags): Code: <div class="s4"> <p class="calibre4">Blah blah</p> <p class="calibre4">Blah blah</p> <p class="calibre4">Blah blah</p> </div> <div class="s6"> <p class="calibre4">Blah blah</p> </div> <div class="s4"> <p class="calibre4">Blah blah</p> </div> Now let's say I am only interested in selecting the <div class="s4"> blocks and stripping their outer div tags. What regex should I use? I've looked into negative lookups as well as non-greedy matches but my head hurts from lack of success. At it's simplest I had hoped I could use something like: Find: <div class="s4">(.*?)</div> Replace: \1 However that doesn't work. Could someone please suggest something? Worst case I will just remove the class from the div tags so it does nothing but it has now reached the point of insulting my pride if I let it completely beat me