Thread: Regex examples
View Single Post
Old 02-06-2022, 12:43 PM   #699
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 29,049
Karma: 210162574
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
</?a ?([^>]+)?>

The question marks are used to mark what comes before as optional.

So </?a is saying that the slash before the 'a' tag is optional. That means it matches both "<a"and "</a".

Then comes the space, which is also made optional, meaning it will match "<a", or "<a ".

The ([^>]+)? is a little more tricky, but not terribly so. The parentheses are used to group everything before the last question mark. Meaning the whole of what's inside the parentheses is optional.

"[^>]" is a common character class when trying to parse html tags. It simply means that it will match any character that is not (^) the greater-than character (>). It's used to ensure that the expression does not get greedy and grab content beyond the ending of the current tag (>). The + is for repetition. + is one or more times, and * means 0 or more times.

The use of + in this case is why the grouping parentheses and the question mark to make the whole thing optional is necessary. In this particular case: the optional space character and the ([^>]+)? could be replaced with simply [^>]*
(meaning match all characters (except >) zero or more times, instead of all characters (except >) one or more times... optionally).

Then match the closing > character.

</?a ?([^>]+)?>

should be synonymous with:

</?a[^>]*>

for the stripping of all opening and closing anchor tags (as well as any self-closing anchor tags of the variety: <a id="anchor_tag_1" />)

But no need to change what works. I included the slight simplification for explanatory purposes.

Last edited by DiapDealer; 02-06-2022 at 12:49 PM.
DiapDealer is offline   Reply With Quote