![]() |
Buglet?
I have just noticed.
I was editing a book and had got near the end. I went to "remove unused stylesheet classes". This refused sayingr that the html was not well formed, I then ran the well-formed check epub (F7). This produced no error. On further checking I found I had a "<<" in one of the files. Shouldn't the well-formed check epub (F7) pick up such an error like the "remove unused stylesheet classes"? |
The F7 is a very basic check. It's limitations are why I have epubcheck and FlightCrew installed (on epub2, Flightcrew saves me from having to do a separate check for unused files).
|
It should detect it. Preview should also have detected it. Please copy the exact xhtml (with the error) and zip it up and post it. I will try to see why the well-formed sanity check did not detect it and fix it.
Thanks, KevinH feel free to change the actual letters to gibberish if needed. |
Quote:
Spoiler:
When I added it before <<p>, it also wasn't flagged. |
Quote:
|
Quote:
Without diving into it, my guess here is that gumbo is "fixing" the extra angle-bracket before the internal well-formed check is performed, whereas that's not happening with the "Remove Unused css Classes" feature. It's possible that something is (or isn't) getting flushed to disk before one or the other of those activities. |
Wow! That's weird. Preview doesn't bomb with </p>> but it does with <<p. Sumpin's up!
|
It's being converted to an entity somewhere. When I Edit as Html with the inspector, I can see the entity..
|
If you test it in W3C's Validation Service:
https://validator.w3.org/#validate_by_input And give it XHTML with a "</p>>": Code:
<?xml version="1.0" encoding="utf-8"?>If you feed it similar in HTML: Code:
<!DOCTYPE html>If you do "<<p>" instead, both the XHTML1.1 + HTML5 checkers ping it. Must be something obscure/weird in the HTML spec. Reminds me when I found that bug with the accidental <p">, and KevinH tracked it down. Turns out such a thing IS valid in HTML... but extremely poor practice. |
Okay I checked the python3lib sanitycheck.py code and it will treat "<<p>" as a spurious text "<" followed by a tag. And it will treat "</p>>" or "<p>>" as a tag followed by a spurious text ">".
I could detect both cases by verifying that the text returned from parsing does not contains an illegal > or < char when not a child of a CDATA tag. So making sanity check detect these cases is doable. I will look into doing that. FWIW, HTML5 parsing rules only require xml escaping a ">" in text if it would be considered to result in ambiguous parsing. Whereas the "<" character should always be xml escaped when used in attribute values and text. Under XHTML, both characters should always be xml escaped when used inside attribute values and text fields. |
This is now fixed in master. Well-Formed Check (sanitycheck.py) will now look for and detect missing xml escaping on '>' and '<' chars in text fields. So it will detect both '<<p>', '<p>>', and '</p>>' cases (of course on any tag).
Thank you for the bug report and helping to improve Sigil! |
| All times are GMT -4. The time now is 10:51 PM. |
Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.