Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 10-28-2016, 12:00 AM   #1
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
well formed - or not?

Hi

I use the Sigil plugin ODTImport to produce either EPUB2 or EPUB3. One of my latest EPUB2 (with xhtml files) has been judged by Sigil -using F7- as having no problems (well formed), the Calibre editor has the same opinion (minus a warning about the naming of font files) and EPUBCHECK has absolutely no remark about this EPUB.

However, Sigil on opening this same EPUB gives a warning and tells me that he can't update the book because one of the files has a badly formed html. I can nevertheless save the book.

If I let him correct ("mend all html files") this file (it's a table covering several pages), it will add some tags (like "<colgroup>" enclosing two "<col>" tags, or "<tbody>" enclosing all "<tr>" tags) which seem to be useful but not really necessary.

My question is: how can an ebook be at the same time well and badly formed?
roger64 is offline   Reply With Quote
Old 10-28-2016, 05:30 AM   #2
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
You can reproduce roger64's issue with the following HTML sample code, which is not identical to roger64's code, but produces the same error:

Before cleaning:

Code:
<table> 
    <col width="80"/> 
    <col width="100"/> 
    <col width="320"/> 
    <tr> 
        <td>row 1, col 1</td> 
        <td>row 1, col 2</td> 
        <td>row 1, col 3</td> 
    </tr> 
</table>
After cleaning:

Code:
  <table>
    <colgroup>
      <col width="80"/>
      <col width="100"/>
      <col width="320"/>
    </colgroup>

    <tbody>
      <tr>
        <td>row 1, col 1</td>
        <td>row 1, col 2</td>
        <td>row 1, col 3</td>
      </tr>
    </tbody>
  </table>
@roger64: Your original table is invalid because it was missing <colgroup> and <tbody> tags.

AFAIK, the welformedness check can't catch these issues, because it can only flag problems with existing tags. I.e. it can only find missing or improperly nested tags.
BTW, epubcheck didn't find these problems either.

Last edited by Doitsu; 10-28-2016 at 05:35 AM.
Doitsu is offline   Reply With Quote
Advert
Old 10-28-2016, 08:35 AM   #3
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Thanks for your explanation.

Maybe the <tbody> tag is not always necessary (see below "tag omission")?
https://www.w3.org/TR/html-markup/tbody.html
roger64 is offline   Reply With Quote
Old 10-28-2016, 11:08 AM   #4
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

Html rules for omitting tags are *not* followed in xhtml (epubs) as xhtml does not allow for omitting tags (opening, closing or otherwise). So while that page might be valid html, it will not be valid xhtml according to the rules.

And fwiw "well-formed" seems to be a continuum. For some tools it just means it follows the basic structure and has matched opening and closing tags. For other tools, it might include if the tags are used in the proper order (ie. has proper parents), etc.

The xhtml rules are much stricter than html rules. So one html tool can say that the html is well formed while another xhtml checker might barf over missing tags which were perfectly legal under html just not under xhtml.

To complicate things even further, epub 3.1 (4?) when it is released some time in the future will allow html parsing rules as well as xhtml parsing rules. It will be a real mess for any single parser to deal with.

Hope this helps.

KevinH
KevinH is offline   Reply With Quote
Old 10-28-2016, 11:30 AM   #5
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,584
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
@KevinH: It appears that the Sigil wellformedness check is less strict than the test that Sigil performs upon opening an epub.

Would it be possible to:

a) have the wellformedness test use the same code that Sigil uses when opening an epub?

or

b) add a menu option that'll execute the same check that Sigl performs when opening an epub?
Doitsu is offline   Reply With Quote
Advert
Old 10-28-2016, 12:22 PM   #6
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Defer to Kevin if his answer contradicts mine, but...

Sigil's well-formedness test is intended be the barest of minimum checks. Are tags closed, are tags properly nested.

The opening/saving/mend tests are the Gumbo parser. It just fixes what's wrong (provided the clean source settings are checked). Gumbo doesn't really have any "tell me what's wrong" functionality, so there's no easy way to use it in any kind of reporting capacity.

To tell the truth, I'd almost be in favor of removing the wellformedness check altogether. 1) People seem to want to rely on it for more than what it was intended for. And 2) The preview window tends to immediately inform someone (pink box) when entered/altered code is fatally incorrect anyway.
DiapDealer is offline   Reply With Quote
Old 10-28-2016, 01:41 PM   #7
Turtle91
A Hairy Wizard
Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.Turtle91 ought to be getting tired of karma fortunes by now.
 
Turtle91's Avatar
 
Posts: 3,094
Karma: 18727053
Join Date: Dec 2012
Location: Charleston, SC today
Device: iPhone 11/X/6/iPad 1,2,Air & Air Pro/Surface Pro/Kindle PW & Fire
Quote:
Originally Posted by DiapDealer View Post
To tell the truth, I'd almost be in favor of removing the wellformedness check altogether. 1) People seem to want to rely on it for more than what it was intended for. And 2) The preview window tends to immediately inform someone (pink box) when entered/altered code is fatally incorrect anyway.
My .02: I use the pink box in the preview window as well. Although it would be really nice if you could replace the "wellformedness check" (F7) with the code used in the SanityChecker plugin.... or some other easy way to identify when a particular file fails the wellformedness check. I think one suggestion mentioned in a previous thread was to make the file name bold or red in the Book Browser so the user can easily see which file is causing the problem. Or maybe putting a selectable list of offending files down in the Validation Results window when you click F7...

Just some ideas.
Turtle91 is offline   Reply With Quote
Old 10-28-2016, 03:05 PM   #8
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 7,644
Karma: 5433388
Join Date: Nov 2009
Device: many
Hi,

The well-formed check in Sigil does use the SanityChecker plugin code (just added permanently to Sigil and not as a plugin anymore). It is there to make sure that every opening tag has the proper closing tag and properly nested and that is about it for the most part. If that level of consistency is present, most other software can at least successfully parse the code.

During parsing, the parser may detect other errors and automatically fix them (like our Gumbo parser) or barf and throw up its hands. When you load a file into the gumbo error-correcting parser it does much the same thing as every Web-Browser does and tries it best to create something actually usable from whatever is provided to it.

So the "well-formed" check in Sigil could be renamed to "Sanity Check" or "Parseable?" or something along those lines but it certainly is not equivalent to gumbo checking the file, or for that matter things like FlightCrew and Epubcheck. If you want those styles checks you need to use the appropriate plugin.

Hope this helps,

KevinH
KevinH is offline   Reply With Quote
Old 10-28-2016, 10:10 PM   #9
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Thanks for your useful explanations.

This example shows that, sometimes, the gumbo parser is more royalist than the King (Epubcheck). The main point is that it's right and sound. Maybe it deserves better than the stern warning Sigil issues before using it if you select the option to mend on opening: it says roughtly "Sigil can correct automatically your html but you can lose data in the process", which always put me off up to now.

I did a second try from the same odt file. I asked ODTImport to produce an EPUB3 without asking Sigil to mend anything. This time, nearly everything was correct for the table, including colgroup and tbody tags. Mystery...

Two small things about this EPUB3:
1- One that can be easily corrected using Sigil is changing the extension name of all the files (from html(5) to xhtml) to please Epubcheck. (the gumbo parser is undisturbed about this trifle).

2- Epubcheck signalls also an error about a deprecated (for EPUB3 only) "cellspacing" attribute on this table, which does not disturb the gumbo parser. It's also easy to take out this attribute though maybe not for a beginner. (I use a saved regex to weed it out).

Last edited by roger64; 10-28-2016 at 10:29 PM.
roger64 is offline   Reply With Quote
Old 10-29-2016, 08:31 AM   #10
JustinThought
Groupie
JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.
 
JustinThought's Avatar
 
Posts: 171
Karma: 3517858
Join Date: May 2016
Location: Monterrey, Mexico
Device: Samsung Tab-3 7"
Quote:
Originally Posted by DiapDealer View Post

To tell the truth, I'd almost be in favor of removing the wellformedness check altogether. 1) People seem to want to rely on it for more than what it was intended for. And 2) The preview window tends to immediately inform someone (pink box) when entered/altered code is fatally incorrect anyway.
No! No! Sometimes I do something REALLY stupid (I'm very good at that!) with a search and replace, and somewhere in one of the many files that make up the book, there's a real problem. This function helps me to find which file contains the error without having to page through them one-by-one. I'm not real fond of the idea of letting Sigil correct the problem; once I know which page contains the error, I can then discover what I've done and correct future behavior.

Last edited by JustinThought; 10-29-2016 at 08:35 AM.
JustinThought is offline   Reply With Quote
Old 10-29-2016, 09:00 AM   #11
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,548
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by JustinThought View Post
No! No! Sometimes I do something REALLY stupid (I'm very good at that!) with a search and replace, and somewhere in one of the many files that make up the book, there's a real problem. This function helps me to find which file contains the error without having to page through them one-by-one. I'm not real fond of the idea of letting Sigil correct the problem; once I know which page contains the error, I can then discover what I've done and correct future behavior.
Fair enough. We just have to find a way for people to understand that it's NOT any kind of validator (neither for xhtml nor epubs). It's a quick and dirty test to make sure that all existing tags have been properly closed and that there's been no improper nesting of tags. Nothing more.

It's an "at a quick glance, nothing seems to have gone terribly wrong" tool. Not an "everything in your epub is absolutely perfect" tool.
DiapDealer is offline   Reply With Quote
Old 10-29-2016, 02:28 PM   #12
JustinThought
Groupie
JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.JustinThought ought to be getting tired of karma fortunes by now.
 
JustinThought's Avatar
 
Posts: 171
Karma: 3517858
Join Date: May 2016
Location: Monterrey, Mexico
Device: Samsung Tab-3 7"
Quote:
Originally Posted by DiapDealer View Post
Fair enough. We just have to find a way for people to understand that it's NOT any kind of validator (neither for xhtml nor epubs). It's a quick and dirty test to make sure that all existing tags have been properly closed and that there's been no improper nesting of tags. Nothing more.

It's an "at a quick glance, nothing seems to have gone terribly wrong" tool. Not an "everything in your epub is absolutely perfect" tool.
Some people like FlightCrew, others like EpubCheck; personally, I like both. Each can find errors that the other ignores; neither guarantees that your code is right. In the final analysis, nothing beats that epub checker that you keep between your ears.
JustinThought is offline   Reply With Quote
Old 10-29-2016, 02:54 PM   #13
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,567
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by DiapDealer View Post
Fair enough. We just have to find a way for people to understand that it's NOT any kind of validator (neither for xhtml nor epubs). It's a quick and dirty test to make sure that all existing tags have been properly closed and that there's been no improper nesting of tags. Nothing more.

It's an "at a quick glance, nothing seems to have gone terribly wrong" tool. Not an "everything in your epub is absolutely perfect" tool.
Why not call it Quick Check (or even Quick & Dirty Check). More often than not it's better to name things descriptively in plain English.

BR
BetterRed is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
iFrame not well-formed so doesn't pass ePubcheck ChuckH ePub 6 11-19-2015 12:01 PM
when SVG is not well-formed brolny Sigil 3 11-12-2015 05:43 PM
Error: Cannot split: ......xhtml XML is not well formed Alt68er Sigil 2 04-23-2014 03:00 AM
Formed of Clay: a novella of betrayal in ancient Egypt theapatra Self-Promotions by Authors and Publishers 0 05-13-2012 09:36 AM
How to handle badly formed xml from web page? kiwidude Development 6 02-19-2011 12:05 AM


All times are GMT -4. The time now is 05:06 AM.


MobileRead.com is a privately owned, operated and funded community.