MobileRead Forums - View Single Post

KevinH · 10-06-2020, 08:50 PM

Okay I checked the python3lib sanitycheck.py code and it will treat "<<p>" as a spurious text "<" followed by a tag. And it will treat "</p>>" or "<p>>" as a tag followed by a spurious text ">".

I could detect both cases by verifying that the text returned from parsing does not contains an illegal > or < char when not a child of a CDATA tag.

So making sanity check detect these cases is doable. I will look into doing that.

FWIW, HTML5 parsing rules only require xml escaping a ">" in text if it would be considered to result in ambiguous parsing. Whereas the "<" character should always be xml escaped when used in attribute values and text. Under XHTML, both characters should always be xml escaped when used inside attribute values and text fields.

10-06-2020, 08:50 PM	#10
KevinH Sigil Developer Posts: 8,475 Karma: 5703586 Join Date: Nov 2009 Device: many	Okay I checked the python3lib sanitycheck.py code and it will treat "<<p>" as a spurious text "<" followed by a tag. And it will treat "</p>>" or "<p>>" as a tag followed by a spurious text ">". I could detect both cases by verifying that the text returned from parsing does not contains an illegal > or < char when not a child of a CDATA tag. So making sanity check detect these cases is doable. I will look into doing that. FWIW, HTML5 parsing rules only require xml escaping a ">" in text if it would be considered to result in ambiguous parsing. Whereas the "<" character should always be xml escaped when used in attribute values and text. Under XHTML, both characters should always be xml escaped when used inside attribute values and text fields. Last edited by KevinH; 10-06-2020 at 08:59 PM. Reason: updating