View Single Post
Old 04-18-2014, 11:17 AM   #17
skreutzer
Software Developer
skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.skreutzer considers 'yay' to be a thoroughly cromulent word.
 
skreutzer's Avatar
 
Posts: 190
Karma: 89000
Join Date: Jan 2014
Location: Germany
Device: PocketBook Touch Lux 3
That's quite funny, it seems you don't have the faintest clue what you are talking about...XML is the universal format for structuring any kind of data which is text-based, and there are lots of powerful tools available to read and write XML files in all kinds of XML-based formats. This way, programmers don't need to write parsers over and over again, just because the technical foundation of a text-based file is some kind of custom markup syntax. Actually, the world is moving more and more towards XML-based formats, which you might even observe with XHTML and EPUB. You might recall that it started out with an even more complex syntactical foundation for structuring text-based files called SGML, from which XML developed as simplified version of it. HTML was initially intended to promote the semantic web and therefore was centered around the concepts of SGML and XML, but then crappy HTML editors and browser-specific rendering made this vision impossible, so slowly the web is recovering from this mess and aiming towards a semantically readable web again, for which EPUB can be a part of and in which content/display separation with XHTML/HTML5 + CSS play an important role in it.

I just want to point out that scientific publishing is heavily involved in a transformation towards digital publishing workflows based upon XML, that the entire e-book sector evolved around XML, that the web with DOM and AJAX has greatly benefited from XML, and that all publishing in general, digital and/or print, will be done with XML based workflows at some point in time. I guess you're quite familiar with word processor formats like ODT or DOCX, and that those formats are too based upon XML isn't a coincidence. Even Calibre is probably engaged with lots of XML-based input files, right?

I completely agree that <span style="text-decoration:underline"> isn't a much better solution, because this would distribute the direct formatting of underline all over the document, which is contrary to the concept of semantic markup. Instead, the ideal solution is to not provide an “underline” at all, because that's only specifying the visual appearance of words and sentences instead of their meaning. If an author wants to underline something, he usually does it with an intention in mind, for instance, to underline all important words or those words which are added in comparison to a previous sentence. This intention gets lost if only “underline” is written into the file. Instead, it would be much better to define CSS classes “important” and “addition” and leave it up to the display/processing software to make sense out of it (for instance, if all added words should be extracted or removed or displayed in red font), while the associated CSS class may still provide text-decoration:underline; as the default visual appearance.

It might be that Calibre isn't a semantical text editor and users expect an “underline” button which looses the intention a user had while pressing it, as long as it just underlines a marked word, but even if that's the case, a XHTML 1.1 + EPUB2 valid and non-directly-formatting solution still would be <span class="underline"> with a CSS class .underline { text-decoration: underline; } or something like it. Especially in HTML5, <u> has a different meaning and is advised to be avoided (in favor of more semantically markup like <em>), so at the moment browsers and e-readers might still render <u> elements as underlined, but that doesn't necessarily have to be the case in the future.

However, I guess Calibre is already making heavy use of CSS classes, so probably you could fix this issue comparatively easily, so that users don't run into validation issues any more, because regardless of what you think about XML validation, there are lots of people who build their workflows with and around XML, so XML validation won't go away any time soon.

I in general would be very interested in discussing semantic markup concepts in general and maybe even for Calibre, but I can imagine that you're actually not interested in it at all. It isn't an absolute requirement if the documents are of basic nature, and only an additional benefit. However, avoiding direct formatting is quite important, because whenever an EPUB gets transformed to another format, the visual definitions of CSS become void while the class names will get matched to their equivalent in the formatting description syntax of the target format.
skreutzer is offline