It crashes for me too. That said if I try the same test with a div tag (a tag that html5 parsing rules do not allow to be unclosed) instead of a p tag, the non-well formed code is properly detected.
So the problem is gumbo is really a self-recovering html5 parser (like browsers use) and not a strict xml parser but the Reports tool uses Qt's overly strict pure xml parser. So according to gumbo an unclosed p tag is okay because the gumbo parser will "fix" it and technically is allowed in html5 according to those parsing rules.
The well-formed check F7 properly detects the unclosed p tag but it is python-based and therefore slower than gumbo which can be used in a multithreaded manner safely.
So either I need to teach gumbo to be more picky and throw more "errors", or we need to rewrite the Reports code to use the gumbo parser instead of the overly strict Qt xml parser.
I think the way forward is to use the gumbo parser to generate the Reports and let Mend handle fixing things when the user wants which should make Reports much less crash prone in general.
What do you think?
|