Quote:
Originally Posted by stumped
Meanwhile yes I can tweak my workflow. It's not understanding what's happening at code level that bugs me.
|
If you open an XHTML (or HTML) file in Sigil, at the very top, you'll see something like this:
Code:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
That DOCTYPE is the code that is being added.
- - -
Note: In your very specific Modify EPUB example, there's a single file—the new cover file—that's missing this DOCTYPE.
- - -
What's That DOCTYPE Code Saying?
"Hey! I'm written in XHTML 1.1."
Programs can use this to determine what the heck is in the file, instead of trying to completely guess.
Reason 1: Rules
This then sets up some "rules" the program can follow. For example:
- In XHTML, everything that opens <p> needs to close </p>.
- In HTML, you can have super ugly stuff like <p> <p> <p> without ever closing it.
If the DOCTYPE is missing, the program will have to guess based on filenames... or just assume it's HTML.
- - -
Sigil, when opening and popping up that warning, is saying:
"Hey! This thing isn't 100% correct according to the specs. Add a DOCTYPE!"
Calibre is saying:
"Meh, it's just HTML or XHTML."
- - -
Reason 2: Named (or Numbered) Entities
"Named Entities" are stuff like:
- & = &
- < = <
- > = >
"Numbered Entities" are stuff like:
- “ = “
- ” = ”
- † = †
In EPUB2, both versions are allowed.
In EPUB3, only the numbers are allowed.
What Sigil/Calibre do, when fixing your files, is:
1. Convert all entities into their proper Unicode characters.
2. In EPUB3, convert to the numbered version if needed.
So everything changes into Unicode Characters:
- " → "
- “ → “
- ” → ”
- † → †
And only a few change to numbered form. The most famous one is:
- →   = Non-Breaking Space
That's pretty much the major code changes that Sigil/Calibre do when you say "Fix my files."
- - -
Note: If you want to read even more of the technical details on DOCTYPE, see the HTML specs:
- - -
Side Note: There's also been tons of topics over the years where we've discussed stuff like:
- Differences between HTML vs. XHTML
- Named vs. Numbered Entities
Just type into your favorite search engine:
Code:
whatever term you want Tex2002ans site:mobileread.com
and you'll probably stumble across
all those topics over the years.
Quote:
Originally Posted by stumped
Maybe I should read an idiots guide to doc types and why they matter, or don't matter , in epub?
|
Just push the button and trust Sigil. :P
It's like going from 99.999% correct to 100% correct.
KevinH (+ Sigil) says follow the specs.
Kovid (+ Calibre) says that .001% doesn't matter in reality.
(This is all going based off memory. It's buried somewhere in those previous DOCTYPE MobileRead topics.)