![]() |
#1 |
Hedge Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 802
Karma: 19999999
Join Date: May 2011
Location: UK/Philippines
Device: Kobo Touch, Nook Simple
|
Buglet?
I have just noticed.
I was editing a book and had got near the end. I went to "remove unused stylesheet classes". This refused sayingr that the html was not well formed, I then ran the well-formed check epub (F7). This produced no error. On further checking I found I had a "<<" in one of the files. Shouldn't the well-formed check epub (F7) pick up such an error like the "remove unused stylesheet classes"? |
![]() |
![]() |
![]() |
#2 |
Bibliophagist
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 44,757
Karma: 168431891
Join Date: Jul 2010
Location: Vancouver
Device: Kobo Sage, Libra Colour, Lenovo M8 FHD, Paperwhite 4, Tolino epos
|
The F7 is a very basic check. It's limitations are why I have epubcheck and FlightCrew installed (on epub2, Flightcrew saves me from having to do a separate check for unused files).
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,475
Karma: 5703586
Join Date: Nov 2009
Device: many
|
It should detect it. Preview should also have detected it. Please copy the exact xhtml (with the error) and zip it up and post it. I will try to see why the well-formed sanity check did not detect it and fix it.
Thanks, KevinH feel free to change the actual letters to gibberish if needed. |
![]() |
![]() |
![]() |
#4 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 5,680
Karma: 23983815
Join Date: Dec 2010
Device: Kindle PW2
|
|
![]() |
![]() |
![]() |
#5 |
Resident Curmudgeon
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 79,088
Karma: 144284184
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,356
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
So long as gumbo has been allowed to change the extraneous > to an entity, then no, it's not (other than potential naked text outside of tags). Plus I have no idea if epubcheck concerns itself with (x)html(5) well-formedness strictures.
Without diving into it, my guess here is that gumbo is "fixing" the extra angle-bracket before the internal well-formed check is performed, whereas that's not happening with the "Remove Unused css Classes" feature. It's possible that something is (or isn't) getting flushed to disk before one or the other of those activities. |
![]() |
![]() |
![]() |
#7 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,356
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Wow! That's weird. Preview doesn't bomb with </p>> but it does with <<p. Sumpin's up!
|
![]() |
![]() |
![]() |
#8 |
Grand Sorcerer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 28,356
Karma: 203720150
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
It's being converted to an entity somewhere. When I Edit as Html with the inspector, I can see the entity..
|
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
|
If you test it in W3C's Validation Service:
https://validator.w3.org/#validate_by_input And give it XHTML with a "</p>>": Code:
<?xml version="1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <title></title> </head> <body> <p>Test</p> <p>And here's an error.</p>> </body> </html> If you feed it similar in HTML: Code:
<!DOCTYPE html> <html> <head> <title></title> </head> <body> <p>Test</p> <p>And here's an error.</p>> </body> If you do "<<p>" instead, both the XHTML1.1 + HTML5 checkers ping it. Must be something obscure/weird in the HTML spec. Reminds me when I found that bug with the accidental <p">, and KevinH tracked it down. Turns out such a thing IS valid in HTML... but extremely poor practice. Last edited by Tex2002ans; 10-06-2020 at 07:06 PM. |
![]() |
![]() |
![]() |
#10 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,475
Karma: 5703586
Join Date: Nov 2009
Device: many
|
Okay I checked the python3lib sanitycheck.py code and it will treat "<<p>" as a spurious text "<" followed by a tag. And it will treat "</p>>" or "<p>>" as a tag followed by a spurious text ">".
I could detect both cases by verifying that the text returned from parsing does not contains an illegal > or < char when not a child of a CDATA tag. So making sanity check detect these cases is doable. I will look into doing that. FWIW, HTML5 parsing rules only require xml escaping a ">" in text if it would be considered to result in ambiguous parsing. Whereas the "<" character should always be xml escaped when used in attribute values and text. Under XHTML, both characters should always be xml escaped when used inside attribute values and text fields. Last edited by KevinH; 10-06-2020 at 08:59 PM. Reason: updating |
![]() |
![]() |
![]() |
#11 |
Sigil Developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 8,475
Karma: 5703586
Join Date: Nov 2009
Device: many
|
This is now fixed in master. Well-Formed Check (sanitycheck.py) will now look for and detect missing xml escaping on '>' and '<' chars in text fields. So it will detect both '<<p>', '<p>>', and '</p>>' cases (of course on any tag).
Thank you for the bug report and helping to improve Sigil! |
![]() |
![]() |