I suppose you could...
In the strip_span_for_page() add the line
Code:
html_text = re.sub(r'<([^>]+)></\1>', '', html_text)
OR
Code:
html_text = re.sub(r'(<(.*)[^>]+)></\2>', r'\1/>', html_text)
before the line
Code:
entities = re.split(r'(<.+?>)', html_text)
The first will strip them completely, the second would turn them into self-closing tags, which you could then catch later, with your 'if equals...'
I'm trying to think if there's any tags which this would strip, that you shouldn't strip.
Are there any?