Hi,
No surprise here but DiapDealer is exactly right. The gumbo parser is an automatic repair parser using the exact same html5 parsing rules as all major browsers. It will detect things that even epubcheck will not complain about and happily fix them if allowed. Browsers use these same rules and repair things silently. Turn on Develop in your browser or Inspector and see what the browser did to your code to verify this.
For example in html5 is it illegal to use the numeric entity for a carriage return:
Code:
& # X 000D ;
or
& # 13 ;
The spec is quite clear on this ...
Quote:
The numeric character reference forms described above are allowed to reference any Unicode code point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), surrogates (U+D800–U+DFFF), and control characters other than space characters.
|
Yes people try to do this to hide a carriage return inside a heading tag for example (instead of using <br/>).
Technically this is illegal and the well-formed check will barf about it, but happily accept it and replace it with the actual carriage return character (which will become a space). This is exactly what browsers will do as well. As more and more ebook reading software becomes webkit/gecko/blink/ browser based in mobile devices, these things become important.
So although the gumbo-based well-formed check is quite picky, it is also quite right about things (the gumbo version Sigil uses has passed the complete html5 testsuite with flying colors) and is safe to turn on. I enable Mend on Open all of the time as a result so that these little nits are fixed, tag mismatches are handled, and etc.
Hope this helps,
KevinH