MobileRead Forums - View Single Post

DiapDealer · 08-09-2014, 02:54 AM

Quote:

Originally Posted by cybmole

I am impressed that it zaps both eth opening and the closing tag, in a single pass, and without needing a \1 replace anywhere

That's part of its simplicity. It's designed to match both the opening and closing 'a' tags themselves, rather than capturing the text in between them and trying to separate that and put it back with a \1.

Quote:

Originally Posted by cybmole

so can you walk me though HOW it works, please -using the above example.

Certainly. It's all about the optional elements (indicated by the '?'s).

Code:

</?a ?([^>]+)?>

Take the opening portion:

Code:

</?a

The /? makes the slash optional. So that means it matches both the <a of the opening tag and the </a of the closing tag.

The space that follows is for demarcation so it doesn't match any other tags that might start with the letter 'a' (addr abbr, area, etc...). It's made optional with the following '?' because the space won't exist in the closing tag.
(NOTE: I can't guarantee it won't match tags like addr, abbr, or area because I frankly haven't tried it--I suspect it might. But those tags are pretty rare. Still ... that's why I prefer the \M approach instead of the " ?". "a\M" matches the letter a at the "end of a word." But \M won't work in all flavors of regex.)

That takes us through

Code:

</?a ?

The [^>] part just means "any character that's not (^) a closing angle brace (>)". The '+' is to indicate one or more repetitions of that "any character that's not a closing angle brace". It's basically a way to capture anything up to the closing angle brace, while ensuring it doesn't get "greedy" and go beyond the next closest angle brace.

Code:

[^>]+

Wrapping the [^>]+ in parentheses just groups it together so that the following question mark makes the entire construct optional (because it won't exist in the closing tag).

Code:

([^>]+)?

So put it all together and it will match </a> as well as:

Code:

<a id="blah" class="blahdeblah" href="blahdedblahdeblah.html#doohickey">

Basically anything that starts with '<a' or '</a' and everything else that may be present, up to and including the next '>'.