10-28-2016, 12:00 AM | #1 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
well formed - or not?
Hi
I use the Sigil plugin ODTImport to produce either EPUB2 or EPUB3. One of my latest EPUB2 (with xhtml files) has been judged by Sigil -using F7- as having no problems (well formed), the Calibre editor has the same opinion (minus a warning about the naming of font files) and EPUBCHECK has absolutely no remark about this EPUB. However, Sigil on opening this same EPUB gives a warning and tells me that he can't update the book because one of the files has a badly formed html. I can nevertheless save the book. If I let him correct ("mend all html files") this file (it's a table covering several pages), it will add some tags (like "<colgroup>" enclosing two "<col>" tags, or "<tbody>" enclosing all "<tr>" tags) which seem to be useful but not really necessary. My question is: how can an ebook be at the same time well and badly formed? |
10-28-2016, 05:30 AM | #2 |
Grand Sorcerer
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
You can reproduce roger64's issue with the following HTML sample code, which is not identical to roger64's code, but produces the same error:
Before cleaning: Code:
<table> <col width="80"/> <col width="100"/> <col width="320"/> <tr> <td>row 1, col 1</td> <td>row 1, col 2</td> <td>row 1, col 3</td> </tr> </table> Code:
<table> <colgroup> <col width="80"/> <col width="100"/> <col width="320"/> </colgroup> <tbody> <tr> <td>row 1, col 1</td> <td>row 1, col 2</td> <td>row 1, col 3</td> </tr> </tbody> </table> AFAIK, the welformedness check can't catch these issues, because it can only flag problems with existing tags. I.e. it can only find missing or improperly nested tags. BTW, epubcheck didn't find these problems either. Last edited by Doitsu; 10-28-2016 at 05:35 AM. |
Advert | |
|
10-28-2016, 08:35 AM | #3 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Thanks for your explanation.
Maybe the <tbody> tag is not always necessary (see below "tag omission")? https://www.w3.org/TR/html-markup/tbody.html |
10-28-2016, 11:08 AM | #4 |
Sigil Developer
Posts: 7,608
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
Html rules for omitting tags are *not* followed in xhtml (epubs) as xhtml does not allow for omitting tags (opening, closing or otherwise). So while that page might be valid html, it will not be valid xhtml according to the rules. And fwiw "well-formed" seems to be a continuum. For some tools it just means it follows the basic structure and has matched opening and closing tags. For other tools, it might include if the tags are used in the proper order (ie. has proper parents), etc. The xhtml rules are much stricter than html rules. So one html tool can say that the html is well formed while another xhtml checker might barf over missing tags which were perfectly legal under html just not under xhtml. To complicate things even further, epub 3.1 (4?) when it is released some time in the future will allow html parsing rules as well as xhtml parsing rules. It will be a real mess for any single parser to deal with. Hope this helps. KevinH |
10-28-2016, 11:30 AM | #5 |
Grand Sorcerer
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
|
@KevinH: It appears that the Sigil wellformedness check is less strict than the test that Sigil performs upon opening an epub.
Would it be possible to: a) have the wellformedness test use the same code that Sigil uses when opening an epub? or b) add a menu option that'll execute the same check that Sigl performs when opening an epub? |
Advert | |
|
10-28-2016, 12:22 PM | #6 |
Grand Sorcerer
Posts: 27,542
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Defer to Kevin if his answer contradicts mine, but...
Sigil's well-formedness test is intended be the barest of minimum checks. Are tags closed, are tags properly nested. The opening/saving/mend tests are the Gumbo parser. It just fixes what's wrong (provided the clean source settings are checked). Gumbo doesn't really have any "tell me what's wrong" functionality, so there's no easy way to use it in any kind of reporting capacity. To tell the truth, I'd almost be in favor of removing the wellformedness check altogether. 1) People seem to want to rely on it for more than what it was intended for. And 2) The preview window tends to immediately inform someone (pink box) when entered/altered code is fatally incorrect anyway. |
10-28-2016, 01:41 PM | #7 | |
A Hairy Wizard
Posts: 3,093
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
|
Quote:
Just some ideas. |
|
10-28-2016, 03:05 PM | #8 |
Sigil Developer
Posts: 7,608
Karma: 5433388
Join Date: Nov 2009
Device: many
|
Hi,
The well-formed check in Sigil does use the SanityChecker plugin code (just added permanently to Sigil and not as a plugin anymore). It is there to make sure that every opening tag has the proper closing tag and properly nested and that is about it for the most part. If that level of consistency is present, most other software can at least successfully parse the code. During parsing, the parser may detect other errors and automatically fix them (like our Gumbo parser) or barf and throw up its hands. When you load a file into the gumbo error-correcting parser it does much the same thing as every Web-Browser does and tries it best to create something actually usable from whatever is provided to it. So the "well-formed" check in Sigil could be renamed to "Sanity Check" or "Parseable?" or something along those lines but it certainly is not equivalent to gumbo checking the file, or for that matter things like FlightCrew and Epubcheck. If you want those styles checks you need to use the appropriate plugin. Hope this helps, KevinH |
10-28-2016, 10:10 PM | #9 |
Wizard
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
|
Thanks for your useful explanations.
This example shows that, sometimes, the gumbo parser is more royalist than the King (Epubcheck). The main point is that it's right and sound. Maybe it deserves better than the stern warning Sigil issues before using it if you select the option to mend on opening: it says roughtly "Sigil can correct automatically your html but you can lose data in the process", which always put me off up to now. I did a second try from the same odt file. I asked ODTImport to produce an EPUB3 without asking Sigil to mend anything. This time, nearly everything was correct for the table, including colgroup and tbody tags. Mystery... Two small things about this EPUB3: 1- One that can be easily corrected using Sigil is changing the extension name of all the files (from html(5) to xhtml) to please Epubcheck. (the gumbo parser is undisturbed about this trifle). 2- Epubcheck signalls also an error about a deprecated (for EPUB3 only) "cellspacing" attribute on this table, which does not disturb the gumbo parser. It's also easy to take out this attribute though maybe not for a beginner. (I use a saved regex to weed it out). Last edited by roger64; 10-28-2016 at 10:29 PM. |
10-29-2016, 08:31 AM | #10 | |
Groupie
Posts: 171
Karma: 3517858
Join Date: May 2016
Location: Monterrey, Mexico
Device: Samsung Tab-3 7"
|
Quote:
Last edited by JustinThought; 10-29-2016 at 08:35 AM. |
|
10-29-2016, 09:00 AM | #11 | |
Grand Sorcerer
Posts: 27,542
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Quote:
It's an "at a quick glance, nothing seems to have gone terribly wrong" tool. Not an "everything in your epub is absolutely perfect" tool. |
|
10-29-2016, 02:28 PM | #12 | |
Groupie
Posts: 171
Karma: 3517858
Join Date: May 2016
Location: Monterrey, Mexico
Device: Samsung Tab-3 7"
|
Quote:
|
|
10-29-2016, 02:54 PM | #13 | |
null operator (he/him)
Posts: 20,538
Karma: 26944418
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
Quote:
BR |
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
iFrame not well-formed so doesn't pass ePubcheck | ChuckH | ePub | 6 | 11-19-2015 12:01 PM |
when SVG is not well-formed | brolny | Sigil | 3 | 11-12-2015 05:43 PM |
Error: Cannot split: ......xhtml XML is not well formed | Alt68er | Sigil | 2 | 04-23-2014 03:00 AM |
Formed of Clay: a novella of betrayal in ancient Egypt | theapatra | Self-Promotions by Authors and Publishers | 0 | 05-13-2012 09:36 AM |
How to handle badly formed xml from web page? | kiwidude | Development | 6 | 02-19-2011 12:05 AM |