MobileRead Forums - View Single Post - Using Regex to find and remove unwanted mediawiki links

Tex2002ans · 12-20-2022, 04:17 PM

Quote:

Originally Posted by aknight2015

So, that will remove the whole link? <a href="words in here" title"=wikilink></a>?

You usually want to be VERY careful trying to capture/delete "everything between the <a> + </a>"—especially if you're new to regex—because you can sometimes have VERY nasty code (or edge-cases) in your books.

Think something like this.

You are trying to get rid of the first RED <a>:

Code:

<a href="Extra"></a><a href="Correct">Clickable Link</a>

If you aren't careful, regex could accidentally do something like this instead:

Code:

</a><a href="Correct">Clickable Link

You see how you:

Remove the 1st <a>
But the the 2nd link's </a> disappeared?

This is why it's sometimes easier to do things in stages, instead of all-in-one swoop.

- - - - -

This is where DiapDealer's Sigil plugin would help:

TagMechanic

That makes sure to match every single open <a> with its matching closing </a>.

- - - - -

You would take your original code:

Code:

<a href="BlahBlahBlah" title="wikilink">A link we don't want.</a>
<a href="BlahBlahBlah2" title="wikilink">A link we don't want.</a>
<a href="3rd-Example" title="wikilink">A link we want.</a>

Step 1: Clean It:

Code:

<a>A link we don't want.</a>
<a>A link we don't want.</a>
<a href="3rd-Example" title="wikilink">A link we want.</a>

Step 2: Run TagMechanic and choose:

Action Type: Delete
Tag Name: a
Having the Attribute: No attributes ("naked" tag)

and it would find all the blank <a>s—with nothing in them—and delete them:

Code:

A link we don't want.
A link we don't want.
<a href="3rd-Example" title="wikilink">A link we want.</a>

- - -

Side Note: I wrote a few TagMechanic tutorials/tips over the years:

It's very helpful for cleaning up code like this.

Want to get rid of all the:

<a class="junk">
empty around everything?

No problem!

Want to convert:

->
->
->

No problem!