View Single Post
Old 08-09-2014, 02:54 AM   #17
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 28,714
Karma: 205039118
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by cybmole View Post
I am impressed that it zaps both eth opening and the closing tag, in a single pass, and without needing a \1 replace anywhere
That's part of its simplicity. It's designed to match both the opening and closing 'a' tags themselves, rather than capturing the text in between them and trying to separate that and put it back with a \1.

Quote:
Originally Posted by cybmole View Post
so can you walk me though HOW it works, please -using the above example.

Certainly. It's all about the optional elements (indicated by the '?'s).
Code:
</?a ?([^>]+)?>
Take the opening portion:
Code:
</?a
The /? makes the slash optional. So that means it matches both the <a of the opening tag and the </a of the closing tag.

The space that follows is for demarcation so it doesn't match any other tags that might start with the letter 'a' (addr abbr, area, etc...). It's made optional with the following '?' because the space won't exist in the closing tag.
(NOTE: I can't guarantee it won't match tags like addr, abbr, or area because I frankly haven't tried it--I suspect it might. But those tags are pretty rare. Still ... that's why I prefer the \M approach instead of the " ?". "a\M" matches the letter a at the "end of a word." But \M won't work in all flavors of regex.)

That takes us through
Code:
</?a ?
The [^>] part just means "any character that's not (^) a closing angle brace (>)". The '+' is to indicate one or more repetitions of that "any character that's not a closing angle brace". It's basically a way to capture anything up to the closing angle brace, while ensuring it doesn't get "greedy" and go beyond the next closest angle brace.
Code:
[^>]+
Wrapping the [^>]+ in parentheses just groups it together so that the following question mark makes the entire construct optional (because it won't exist in the closing tag).
Code:
([^>]+)?


So put it all together and it will match </a> as well as:
Code:
<a id="blah" class="blahdeblah" href="blahdedblahdeblah.html#doohickey">
Basically anything that starts with '<a' or '</a' and everything else that may be present, up to and including the next '>'.

Last edited by DiapDealer; 08-09-2014 at 03:03 AM.
DiapDealer is online now   Reply With Quote