MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Not Well formed 1.4.3 (https://www.mobileread.com/forums/showthread.php?t=336310)

JSWolf 01-07-2021 06:51 PM

Not Well formed 1.4.3
 
1 Attachment(s)
My system is Windows 10 Home 64-bit. I have the 64-bit version of Sigil 1..3 installed along with the epubcheck plugin.

I was loading an ePub 3 eBook and Sigil pops up the message...

Quote:

This EPUB has HTML files that are not well formed or are missing a DOCTYPE, html, head, or body elements. Sigil can automatically fix these files, although this may result in minor data loss in extreme circumstances.

Do you want to automatically fix the files?
When I clicked no, I then checked the eBook for errors using epubcheck and I received no errors.

Is this a bug with Sigil? If so, can it be fixed? If it's not a bug, what in the eBook code is incorrect?

Here is a scrambled copy of the eBook. The only changes made is that the embedded fonts were removed and all CSS code referencing these fonts was removed. Otherwise, it's the unchanged scrambled code. I did check it with epubcheck and it passed.

KevinH 01-07-2021 07:06 PM

No, If you read the error message it tells you that it is missing its DOCTYPE (assuming it has html, head and body tags), which is required by the epub spec. It is an open issue on epubcheck to test and report this. Calibre does not follow this aspect of the spec. Sigil does and has for years prior to a couple of releases ago when auto mending to move things to its standard layout always fixed it. Now that we no longer move things to standard locations, the auto fixing is no longer done.

DNSB 01-07-2021 11:50 PM

Mend and prettify or just Mend will add the missing doctypes. The CSSUndefinedClasses plugin is not happy with running against an epub with those errors so I've been using Mend to fix the issue.

Looking at your scrambled epub, the first block is before mend and prettify, the second is after.

Code:

<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en-us" xml:lang="en-us">
  <head>

Code:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html>

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:epub="http://www.idpf.org/2007/ops" lang="en-us" xml:lang="en-us">
<head>


Ashjuk 01-08-2021 05:34 AM

I get that warning message on pretty much every new book I open in Sigil.

I am puzzled as to why Sigil says the book is not well formed because I can open a book (that's not been opened in Sigil) in Freda, ADE, on my Kobo, on my Ipad and (shudder) Calibre and none of them complain that the book is malformed.

From my experience it's only Sigil that brings up the warning about the missing DOCTYPE tag. Also, if it is so important, why is that just about every book I come across is missing it? Even new releases appear not to have it so I can only assume it's not that critical.

DiapDealer 01-08-2021 07:19 AM

Funny. I get the warning on pretty much none of the epubs I open.

Being able to open a book in an ereading program with no warning has never been an indicator of whether the epub in question was spec-compliant or not.

Either start hitting yes to the warning (and subsequently saving the epub after) or get used to seeing the warning. Those are your options.

Ashjuk 01-08-2021 08:02 AM

Quote:

Originally Posted by DiapDealer (Post 4079565)
Funny. I get the warning on pretty much none of the epubs I open.

Being able to open a book in an ereading program with no warning has never been an indicator of whether the epub in question was spec-compliant or not.

Either start hitting yes to the warning (and subsequently saving the epub after) or get used to seeing the warning. Those are your options.

As you say, I just hit yes and let Sigil do it's thing but just wondering that's all - no criticism of Sigil.

KevinH 01-08-2021 10:54 AM

Sigil is making the most spec compliant and consistent epub it can. According to the spec, DOCTYPE is required and older versions of Sigil quietly fixed this on load as it had to move things to fit Sigil's standard form. Newer versions of Sigil no longer auto fix the missing DOCTYPE (but Mend will properly fix it) so it warns the user to fix things and offers to auto fix for them.

Those same e-readers will work just with the DOCTYPE. As I said, epubcheck has an open issue to fix this.

BTW, any epub2 that has and uses any named entities (ie like nbsp) in it that is missing the DOCTYPE is technically broken and will not work on most e-readers because epub2's version of the DOCTYPE is where the named entities are included.

That is why this is important to fix.

Calibre is not spec compliant on this issue but does replace all named entities with their numeric or character equivalents, which makes not having a DOCTYPE even on epub2 possible but technically against the rules.

Ashjuk 01-09-2021 05:18 AM

Thanks for the explanation, Kevin.

As I said, I was just curious why it seemed that it was only Sigil that picked up on the missing DOCTYPE.

odamizu 01-09-2021 05:00 PM

Is there a way to tell what changes will be made if you agree to the changes Mend wants to make? i.e., I often find there are Sigil features I'm not aware of and learn about reading these forums, so just checking if I'm missing something that's already there.

I will also admit that the part of the warning that says "Sigil can automatically fix these files, although this may result in minor data loss in extreme circumstances [emphasis added]" always gives me pause, making me want to know exactly what changes are being proposed.

Thank you

BeckyEbook 01-09-2021 06:06 PM

Quote:

Originally Posted by odamizu (Post 4080244)
Is there a way to tell what changes will be made if you agree to the changes Mend wants to make?

What will be changed you cannot see, but ...

If you want to see what exactly Sigil changes during these changes:
1. Open the EPUB file
2. If you see the message, choose [No]
3. Save the checkpoint (Checkpoints > Create Checkpoint for Epub or [🡅] icon)
4. Close the EPUB file (without saving!)
5. Open the same EPUB file again
6. Select [Yes] when you see the message
7. Check what has changed (Checkpoints > Compare Epub against Checkpoint or [±] icon)

KevinH 01-09-2021 06:11 PM

That message was mostly leftover from the old days when Sigil used HTML Tidy and it would occasionally mess up.

Modern versions of Sigil use the gumbo parser that autorepairs following the exact same rules as major browsers like Safari, Edge, Chrome, Firefox.

Sigil has git checkpointing built in. So before making any change simply run Checkpoint so that you can see the diffs of what changed and even revert to an earlier checkpoint if you so desire.

KevinH 01-09-2021 06:15 PM

FWIW, any version of Sigil in the 0.9.x range long used gumbo Mend to silently fix things like missing doctypes when moving files to old "standard" layouts. Gumbo did that literally for years with no problems. Running gumbo (Mend) is very safe in general.

DiapDealer 01-09-2021 08:08 PM

Quote:

Originally Posted by BeckyEbook (Post 4080261)
What will be changed you cannot see, but ...

If you want to see what exactly Sigil changes during these changes:
1. Open the EPUB file
2. If you see the message, choose [No]
3. Save the checkpoint (Checkpoints > Create Checkpoint for Epub or [🡅] icon)
4. Close the EPUB file (without saving!)
5. Open the same EPUB file again
6. Select [Yes] when you see the message
7. Check what has changed (Checkpoints > Compare Epub against Checkpoint or [±] icon)

Thank you! You saved me the trouble of typing all that. :)

JSWolf 01-09-2021 08:40 PM

Windows 10 Home 64-bit and Sigi 1.4.3 64-bit

I follow the directions just posted. When I click the icons to compare ePub against the checkpoint, i get Diff Failed: No checkpoints found. But when I go to Manage Checkpoint Repositories, I see the checkpoint I created.

Am I doing anything wrong?

KevinH 01-09-2021 09:33 PM

The uuid of the opf is used as the checkpoint repo identifier and one is created if none exists but if you did not save after the checkpoint, the next time you load that epub, yet another new uuid will be created and no match will be found.

So load your epub, do not allow mend. Do the checkpoint. That will add a uuid dc identifier automatically. Save that file to a new name (to prevent confusion). Now you can either run Mend now and thendo the compare against the checkpoint or load the newly saved file and then run mend, and compare this to its checkpoint.

This is only as issue for epubs that do not have any uuid dc:identifier to begin with.


All times are GMT -4. The time now is 09:24 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.