MobileRead Forums - View Single Post - Confused! XHTML, HTML, HTML5, EPUB2, EPUB3???

twobits · 02-20-2013, 04:26 PM

Quote:

Originally Posted by dgatwood

From the top:

SGML stands for Standard Generalized Markup Language. There are many SGMLs. Crudely put, an SGML is any markup language that is characterized by any arbitrary set of tags surrounded by angle braces, with certain bits at the beginning to tell you what type of file it is, and there are probably a few other rules.

There is only one SGML actually. It is an ISO standard now and descended from GML. Overall though this was a pretty good summary, except you missed one key piece.

DTD, or Document Type Declaration. This defines what tags and rules for them make up a valid document for that document type.

Quote:

XML is a strict subset of SGMLs. XML is a strict subset of SGMLs in which, among other things, all tags must be matched with a close tag, and a few other details. There are many dialects of XML (an XML dialect is basically just a specific set of allowed tags that can be nested in specific ways), including DocBook, XHTML, property lists, and so on.

It is not a dialect of XML but a DTD for XML.

Quote:

HTML is an example of an SGML. HTML has a specific set of tags that are considered valid. HTML is not, however, based on XML, because some tags do not have to be closed at all (hr, script when a URL is provided, and so on), and some tags auto-close at the right time (p, li, etc.). [Edit: And, as Turtle91 pointed out, HTML specifies case-insensitive tag and attribute parsing, whereas XML specifies case-sensitive tag and attribute parsing, which, in the case of XHTML's built-in tags and attributes, translates to "all lowercase".]

At first html was only modeled on sgml, but was more adhoc then sgml allowed. It was not until later (4.0 or 3.2 can't recall which off hand) that it was given a formal dtd that made it true sgml.

Quote:

HTML5 is a specific version of HTML. Like all HTMLs, it is an SGML, but HTML5 files are not (necessarily) proper XML.

XHTML is a special form of HTML that has been modified slightly so that every XHTML file is a proper XML file that conforms to the stricter XML standards. This requires a few tiny tweaks around the fringes, but it mostly looks like HTML with some extra close tags or self-closing tags.

Right about XHTML, but it is probably worth noting that XHTML is simple a DTD for XML.

Quote:

XSLT is another XML dialect. An XSLT stylesheet provides a set of rules for transforming from one XML dialect to another (typically, though in practice, it can be used to translate a specified XML dialect into pretty much anything, up to and including LaTeX commands).

Actually XSLT is a Turing complete language. To use it you usually also need to learn XQuery and XPath.

Quote:

EPUB2 and EPUB3 are versions of EPUB. EPUB2 uses XHTML under the hood. EPUB3 uses HTML5, but it must be parseable as XML. So it must be a polyglot XML/HTML5 document. This polyglot is called XHTML5, but is defined as part of the HTML5 standard rather than in a separate standard as previous XHTML versions were.

Clear as mud?

Damn alphabet soup ! I hate XML! lol