The r''' ''' is probably overkill in this situation, but I've gotten into the habit of using them all the time for regex expressions in python. '[-–—]' or "[-–—]" would achieve the same thing in this particular instance. It's still just a string representation of the regex expression.
Code:
text_str = regex.sub(r'''[-–—]''', ' ', match.group(3))
regex.substitute('everything matching this expression', with 'this', in 'this string')
Find all occurrences of - or – or — and replace them with a space in the string contained in the 3rd matching group. Store the results in text_str.
Code:
text_str = regex.sub(r''' {2,}''', ' ', text_str)
Find all occurrences of two or more consecutive spaces and replace them with a single space in the text_str string. Store the results in text_str.
Code:
return '<{0}{1}>{2}</{0}>'.format(match.group(1), match.group(2), text_str)
String formatting/substitution.
Code:
'Hello {0}'.format('there')
Substitute {0} with 'there'
Code:
'Hello {0} {1} {2}, {0}'.format('there', 'you', 10)
Becomes 'Hello there you 10, there.'
You don't even need to use numbers if you're not going to repeat anything:
Code:
'Hello {} {} {}, {}'.format('there', 'you', 10, 'you')
You could also use string concatenation:
Code:
return match.group(1) + match.group(2) + text_str + match.group(1)
But then you have to worry about making sure everything is represented properly as a string beforehand. Probably not necessary in this case, but again, just a habit I've gotten into to avoid type mismatches (plus I just
like it better than the %s %d string substitution method

)
Code:
return '<%s%s>%s</%s>' % (match.group(1), match.group(2), text_str, match.group(1))
In this particular case:
Code:
return '<{0}{1}>{2}</{0}>'.format(match.group(1), match.group(2), text_str)
match.group(1) will be the tag name (h1, h2, h3, etc) and gets plugged into both {0}s.
match.group(2) will be any (optional) attributes (class="foo") and gets plugged into {1}.
text_str is our manipulated content from between the h-tags and gets plugged into {2}