MobileRead Forums - View Single Post

Perkin · 04-07-2014, 12:27 PM

Quote:

Originally Posted by Rev. Bob

I've made that change, and I've found another bug: if HR, BR, or IMG are coded as no-content containers rather than self-closing elements (stupid, but legal), the closing tags are removed but the opening tag is not converted to self-closing.

In other words, <hr></hr> is truncated to a bad <hr> instead of converted to a correct <hr/>.

The culprit seems to be the logic in lines 590-591 of the attached version's modify.py, in which those elements are always assumed to be self-closing:

Code:

elif entity[:3] == '<hr' or entity[:3] == '<br' or entity[:4] == '<img':
    this_entity.e_type = 3

To dodge that bug, I've simply commented that test out for now. Thus, those elements are tested like every other element, and the bad-but-okay form is preserved - but it would be nice if <foo a="x" b="y"></foo> could be converted to <foo a="x" b="y"/> across the board. I'm just not sure how to modify your code to do so.

Doing a quick test (and having to research) the last truncation can be done quite simply...

Code:

#!/usr/bin/env python

import re

result = re.sub(r'(<(.*)[^>]+)></\2>', r'\1/>', '<foo a="x" b="y"></foo>')
print result