Quote:
Originally Posted by Rev. Bob
I've made that change, and I've found another bug: if HR, BR, or IMG are coded as no-content containers rather than self-closing elements (stupid, but legal), the closing tags are removed but the opening tag is not converted to self-closing.
In other words, <hr></hr> is truncated to a bad <hr> instead of converted to a correct <hr/>.
The culprit seems to be the logic in lines 590-591 of the attached version's modify.py, in which those elements are always assumed to be self-closing:
Code:
elif entity[:3] == '<hr' or entity[:3] == '<br' or entity[:4] == '<img':
this_entity.e_type = 3
To dodge that bug, I've simply commented that test out for now. Thus, those elements are tested like every other element, and the bad-but-okay form is preserved - but it would be nice if <foo a="x" b="y"></foo> could be converted to <foo a="x" b="y"/> across the board. I'm just not sure how to modify your code to do so.
|
Doing a quick test (and having to research) the last truncation can be done quite simply...
Code:
#!/usr/bin/env python
import re
result = re.sub(r'(<(.*)[^>]+)></\2>', r'\1/>', '<foo a="x" b="y"></foo>')
print result