MobileRead Forums - View Single Post

dgatwood · 12-02-2015, 08:42 PM

Quote:

Originally Posted by Sarmat89

It doesn't even make sense. XHTML must be a well-formed XML and have some kind of structure. It is no any different from a pure XML format.

I assume by "XML", Hitch meant "semantic markup". XML is just a specific set of rules for markup, where all of the tags must be balanced, the attributes have to be properly quoted, etc. It says nothing about the tags themselves.

XHTML is presentational markup (like HTML), and it just happens to be in XML format.

Most of the time, though, when we talk about XML, we're talking about good XML—semantic markup. That means that the tags themselves are meaningful.

HTML is partially semantic. Paragraph tags, though abbreviated, have meaning. A paragraph is a structural unit of content. Emphasis tags are another example. However, HTML and XHTML are more typically used presentationally, with tags like b (bold), font, hr (horizontal rule), etc., which define the appearance of the content rather than the structure of it.

To give you an example, here's the semantic markup for the first bit of the first chapter of "A Patriots Christmas" (a short story that is part of my Patriots book series):

Code:

<chapter>
<title>About eleven years after the events in Beyond the Veil</title>
<subtitle>(Dec. 24, 2401)</subtitle>
<para>’Twas the night before Christmas, and all through the house, Amanda was cleaning egg nog from her blouse, when what to her wondering eyes should appear, but Jen and Marc and their daughter so dear.</para>
<para>As Joseph ran to the door to see what was the matter, Amanda shouted down at him. *“Don’t forget that Jen and Marc are coming over tonight, so be on your best behavior.”</para>
<para>Joseph smirked at the lack of poetry, then threw open the door like a flash, just for good measure.</para>
<para>“Marc, Jen! *It’s so good to see you both!” he said. *“And who is this dashing young lady?”</para>

...

</chapter>

Other chapters have a series of para tags followed by one or more <section> tags that each wrap a bunch more <para> tags.

When I translate the markup from semantic markup (DocBook, in this case) to presentational markup (XHTML, in this case), the <chapter> tag effectively becomes a new file containing all of this:

Code:

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:svg="http://www.w3.org/2000/svg" xmlns:epub="http://www.idpf.org/2007/ops">
<head>
<title>Chapter I: About eleven years after the events in Beyond the Veil</title>
<link rel="stylesheet" type="text/css" href="nookstyles.css" />
<link rel="stylesheet" type="text/css" href="nookstyles2.css" />
</head>
<body>
<div class="chapter">
<div class="chapterheadbox"><div class="chapterheading">Chapter</div><div class="chapternumber">One</div>
</div><div class="title">About eleven years after the events in Beyond the Veil</div>

<div class="subtitle">(Dec. 24, 2401)</div>
<p>...</p>
...
</div>
</body>
</html>

and when there are section tags, you get

Code:

<div class="section">
...
</div>

wrapped around them, but if there are two sections back-to-back, you also get

Code:

<div class="sectionmark"><span>***</span></div>

between them. Notice that this mark div has absolutely no purpose other than presentation. (Of course, were it not for bugs in some ePUB readers, I'd be using CSS instead, but that's another matter.)

Also notice that the chapter tag exploded into the word chapter followed by a chapter number (written out), each in its own container, each styled separately, none of which was present in the original content.

What makes semantic markup useful is that software can examine it and reason about it programmatically. If a piece of software sees a bunch of random div elements, it has no way to know that a sectionmark div appears between sections. But if it sees a section tag, it knows that those are sections. And, for more complicated works such as programming documentation, it knows the difference between a section that's at the top level of a chapter and a section that's inside another section. If, for example, that software converts it to presentational markup, it might make each nesting level be indented further than the last.

The chapter number is a great example of this. When I started writing the content, I was dealing with manual chapter numbers, and it was a nightmare to keep fixing them every time I added a new chapter break. By changing the markup to simply treat each chapter as a unit, that becomes trivial. When the software produces the actual output, it just counts the chapters as it goes, and puts in the correct number.

And in nonfiction books, semantic markup can be even more meaningful. For example, when writing developer documentation, we would put certain bits of text in code font (monospace). Had we used presentational markup, these would be indistinguishable. However, if people mark them up correctly, you can tell whether that bit of text is a function (which should ideally be auto-linked to the function's documentation), a constant (same), the name of a command-line tool (which should be linked to a very different kind of documentation), etc.

None of those differences matter to the end reader. However, they can be important to tools that operate on the content, and they can provide you (as the CSS creator) with the ability to change formatting later on. For example, if you later decide that you want to change your house style so that all the function names end with (), you can add a tiny bit of CSS (using the ::after pseudo-element and the content property), and now every one of those function names now ends with (), but the constants (that were formatted in the same way) no longer are.

It is basically just like what you do in XHTML with div tags and classes, except that the tag names are standardized, which means that there are tools out there that can work with the content across organizational boundaries, confident that a paragraph really is a paragraph and a function name really is a function name.