MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Buglet? (https://www.mobileread.com/forums/showthread.php?t=333764)

Thasaidon 10-06-2020 02:40 AM

Buglet?
 
I have just noticed.

I was editing a book and had got near the end. I went to "remove unused stylesheet classes". This refused sayingr that the html was not well formed,

I then ran the well-formed check epub (F7). This produced no error. On further checking I found I had a "<<" in one of the files.

Shouldn't the well-formed check epub (F7) pick up such an error like the "remove unused stylesheet classes"?

DNSB 10-06-2020 11:49 AM

The F7 is a very basic check. It's limitations are why I have epubcheck and FlightCrew installed (on epub2, Flightcrew saves me from having to do a separate check for unused files).

KevinH 10-06-2020 03:50 PM

It should detect it. Preview should also have detected it. Please copy the exact xhtml (with the error) and zip it up and post it. I will try to see why the well-formed sanity check did not detect it and fix it.

Thanks,

KevinH

feel free to change the actual letters to gibberish if needed.

Doitsu 10-06-2020 04:22 PM

Quote:

Originally Posted by KevinH (Post 4044039)
It should detect it.

I just added an additional angle bracket to a </p>> tag and F7 didn't complain about it.

Spoiler:
Code:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
  <title></title>
</head>

<body>
  <p>&nbsp;</p>>
</body>
</html>



When I added it before <<p>, it also wasn't flagged.

JSWolf 10-06-2020 04:29 PM

Quote:

Originally Posted by Doitsu (Post 4044051)
I just added an additional angle bracket to a </p>> tag and F7 didn't complain about it.

Spoiler:
Code:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
  <title></title>
</head>

<body>
  <p>&nbsp;</p>>
</body>
</html>



When I added it before <<p>, it also wasn't flagged.

epubcheck does not catch </p>>.. Is <p>> really an error?

DiapDealer 10-06-2020 07:41 PM

Quote:

Originally Posted by JSWolf (Post 4044054)
epubcheck does not catch </p>>.. Is <p>> really an error?

So long as gumbo has been allowed to change the extraneous > to an entity, then no, it's not (other than potential naked text outside of tags). Plus I have no idea if epubcheck concerns itself with (x)html(5) well-formedness strictures.

Without diving into it, my guess here is that gumbo is "fixing" the extra angle-bracket before the internal well-formed check is performed, whereas that's not happening with the "Remove Unused css Classes" feature. It's possible that something is (or isn't) getting flushed to disk before one or the other of those activities.

DiapDealer 10-06-2020 07:46 PM

Wow! That's weird. Preview doesn't bomb with </p>> but it does with <<p. Sumpin's up!

DiapDealer 10-06-2020 07:48 PM

It's being converted to an entity somewhere. When I Edit as Html with the inspector, I can see the entity..

Tex2002ans 10-06-2020 08:02 PM

If you test it in W3C's Validation Service:

https://validator.w3.org/#validate_by_input

And give it XHTML with a "</p>>":

Code:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
  "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
<head>
  <title></title>
</head>

<body>
  <p>Test</p>
  <p>And here's an error.</p>>
</body>
</html>

you get a "character data is not allowed here" error.

If you feed it similar in HTML:

Code:

<!DOCTYPE html>
<html>
<head>
  <title></title>
</head>

<body>
  <p>Test</p>
  <p>And here's an error.</p>>
</body>

no such error. It thinks it's fine...

If you do "<<p>" instead, both the XHTML1.1 + HTML5 checkers ping it.

Must be something obscure/weird in the HTML spec. Reminds me when I found that bug with the accidental <p">, and KevinH tracked it down. Turns out such a thing IS valid in HTML... but extremely poor practice.

KevinH 10-06-2020 09:50 PM

Okay I checked the python3lib sanitycheck.py code and it will treat "<<p>" as a spurious text "<" followed by a tag. And it will treat "</p>>" or "<p>>" as a tag followed by a spurious text ">".

I could detect both cases by verifying that the text returned from parsing does not contains an illegal > or < char when not a child of a CDATA tag.

So making sanity check detect these cases is doable. I will look into doing that.

FWIW, HTML5 parsing rules only require xml escaping a ">" in text if it would be considered to result in ambiguous parsing. Whereas the "<" character should always be xml escaped when used in attribute values and text. Under XHTML, both characters should always be xml escaped when used inside attribute values and text fields.

KevinH 10-07-2020 11:38 AM

This is now fixed in master. Well-Formed Check (sanitycheck.py) will now look for and detect missing xml escaping on '>' and '<' chars in text fields. So it will detect both '<<p>', '<p>>', and '</p>>' cases (of course on any tag).

Thank you for the bug report and helping to improve Sigil!


All times are GMT -4. The time now is 10:51 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.