04-18-2014, 09:50 AM | #16 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The world has moved on from XML. You need to stop clinging to XML in your workflows. A serious publishing tool, as you put it, should deal with actual, real-world HTML and produce code that is tested to work in actual real world software, rather than a DTD/spec that is a guarantee of precisely nothing.
A <u> tag is a much nicer construct than <span style="text-decoration:underline"> which is what the epub 2 spec would require here. Not only is it nicer, it actually works everywhere and has the added advantage of allowing for easy customization of styles via CSS overrides. All a user has to do is add u { text-decoration: none} to turn off underlines. Which cannot be done with the style sttribute. Sometimes you have to look beyond the spec and implement a solution that actually is superior than what t he spec recommends. |
04-18-2014, 11:17 AM | #17 |
Software Developer
Posts: 189
Karma: 89000
Join Date: Jan 2014
Location: Germany
Device: PocketBook Touch Lux 3
|
That's quite funny, it seems you don't have the faintest clue what you are talking about...XML is the universal format for structuring any kind of data which is text-based, and there are lots of powerful tools available to read and write XML files in all kinds of XML-based formats. This way, programmers don't need to write parsers over and over again, just because the technical foundation of a text-based file is some kind of custom markup syntax. Actually, the world is moving more and more towards XML-based formats, which you might even observe with XHTML and EPUB. You might recall that it started out with an even more complex syntactical foundation for structuring text-based files called SGML, from which XML developed as simplified version of it. HTML was initially intended to promote the semantic web and therefore was centered around the concepts of SGML and XML, but then crappy HTML editors and browser-specific rendering made this vision impossible, so slowly the web is recovering from this mess and aiming towards a semantically readable web again, for which EPUB can be a part of and in which content/display separation with XHTML/HTML5 + CSS play an important role in it.
I just want to point out that scientific publishing is heavily involved in a transformation towards digital publishing workflows based upon XML, that the entire e-book sector evolved around XML, that the web with DOM and AJAX has greatly benefited from XML, and that all publishing in general, digital and/or print, will be done with XML based workflows at some point in time. I guess you're quite familiar with word processor formats like ODT or DOCX, and that those formats are too based upon XML isn't a coincidence. Even Calibre is probably engaged with lots of XML-based input files, right? I completely agree that <span style="text-decoration:underline"> isn't a much better solution, because this would distribute the direct formatting of underline all over the document, which is contrary to the concept of semantic markup. Instead, the ideal solution is to not provide an “underline” at all, because that's only specifying the visual appearance of words and sentences instead of their meaning. If an author wants to underline something, he usually does it with an intention in mind, for instance, to underline all important words or those words which are added in comparison to a previous sentence. This intention gets lost if only “underline” is written into the file. Instead, it would be much better to define CSS classes “important” and “addition” and leave it up to the display/processing software to make sense out of it (for instance, if all added words should be extracted or removed or displayed in red font), while the associated CSS class may still provide text-decoration:underline; as the default visual appearance. It might be that Calibre isn't a semantical text editor and users expect an “underline” button which looses the intention a user had while pressing it, as long as it just underlines a marked word, but even if that's the case, a XHTML 1.1 + EPUB2 valid and non-directly-formatting solution still would be <span class="underline"> with a CSS class .underline { text-decoration: underline; } or something like it. Especially in HTML5, <u> has a different meaning and is advised to be avoided (in favor of more semantically markup like <em>), so at the moment browsers and e-readers might still render <u> elements as underlined, but that doesn't necessarily have to be the case in the future. However, I guess Calibre is already making heavy use of CSS classes, so probably you could fix this issue comparatively easily, so that users don't run into validation issues any more, because regardless of what you think about XML validation, there are lots of people who build their workflows with and around XML, so XML validation won't go away any time soon. I in general would be very interested in discussing semantic markup concepts in general and maybe even for Calibre, but I can imagine that you're actually not interested in it at all. It isn't an absolute requirement if the documents are of basic nature, and only an additional benefit. However, avoiding direct formatting is quite important, because whenever an EPUB gets transformed to another format, the visual definitions of CSS become void while the class names will get matched to their equivalent in the formatting description syntax of the target format. |
Advert | |
|
04-18-2014, 11:21 AM | #18 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Christ on a cross. http://www.whatwg.org/specs/web-apps...e/parsing.html
HTML is parsed by a single, well defined, algorithm. XHTML is history. You go on believing whatever you like, I have neither the time nor the inclination to debate with you. |
04-18-2014, 11:29 AM | #19 |
Software Developer
Posts: 189
Karma: 89000
Join Date: Jan 2014
Location: Germany
Device: PocketBook Touch Lux 3
|
How can you be this ignorant about even basic technical concepts? Every file that ever needs to be read, needs to be parsed, regardless of what format it is written in. With XML however, one gets the benefit of using an predefined XML parser instead of re-implementing that algorithm over and over again. Additionally, with validated XML, the parsing algorithm doesn't need to be dump regarding the input file, it can already know all elements that might occur, where they might occur and react to them. Without it, one has to write a lot of extra code for all kinds of nonsense input for which one can't even know what to do with it in terms of rendering or transformation.
Abstraction is of great benefit for programming in general, programming libraries save the developer lots of time from re-implementing the same methods over and over again. XML, HTML, EPUB can be considered as standardized common protocols on which standardized common programming libraries operate on. Violating those protocols equals to breaking the system. Last edited by skreutzer; 04-18-2014 at 11:39 AM. |
04-18-2014, 01:01 PM | #20 | |
Wizard
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
|
Quote:
Other formats are on the rise, more loosly and also easier to read. At least in my field the use of XML is declining. In ePUB there is actually no need for X(HT)ML as opposed to HTML. It makes it more complex and adds nothing. |
|
Advert | |
|
04-18-2014, 01:44 PM | #21 |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Why are we discussing xml at all? @skreutzer, your problem is with calibre having the <u> tag, which isn't about xml it is about html and the epub standard.
A far as xml goes, the editor doesn't push bad xml on you. What are you arguing about, exactly?? |
04-18-2014, 03:16 PM | #22 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
04-18-2014, 03:56 PM | #23 | |
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Quote:
.... .... The <u> tag gets closed by a </u> tag... While very informative about the general nature of epub and xhtml, I am sure, I still fail to see what that has to do with <u> being evil according to the epub spec. Also, I knew everything you just said; in no way did I contraindicate it. We are still only dealing with the html aspect of epub/xhtml, since that is what the <u> tag (and any discussion about it within the context of calibre and/or Kovid breaking the holy standards) actually touches on. |
|
04-18-2014, 04:09 PM | #24 |
Software Developer
Posts: 189
Karma: 89000
Join Date: Jan 2014
Location: Germany
Device: PocketBook Touch Lux 3
|
@Toxaris:
What is your impression, are those integration problems caused by proprietary applications and their questionable data formats or by the lack of tools that would translate and interact between XML formats if it were in XML? At least for publishing, there are huge advantages in XML-based workflows, and it is widely used in various ways. What kind of formats are increasing in application integration, binary ones with corresponding APIs to access them? @eschwartz: I frequently run into people who have problems with Calibre EPUB files (with reading them or using them for online services), and a portion of those problems are due to the lack of standard conformance, which is absolutely avoidable, but forces users to fix those errors while they usually don't know how. I myself develop processing tools which convert to EPUB, especially to build automated workflows based upon semantic markup. I won't produce invalid output, I'll refuse invalid input, I'll support front ends which are compliant to the various standards (so all conversions in the workflow from the first character typed to the printed hardcover book will be executed over well-defined interfaces). I don't support EPUB2 input yet, and I don't have a problem with <u> since I don't use Calibre (the issue wasn't initially posted by me). However, if file editors are sloppy with their output, a lot of files need to be manually adjusted in the future or are less usable as they could be because visual appearance was encoded instead of meaning. A developer of EPUB2 or XHTML 1.1 processing or reading software would not only be expected to support all elements that are specified by the standards, but also to react on the old crappy legacy stuff like <marquee> etc. This bloats code for no use at all, and if the task is to just do some primitive transformations, it is far better to ignore all the old, deprecated non-standard constructs which shouldn't be there anyway. |
04-18-2014, 04:15 PM | #25 | |
Grand Sorcerer
Posts: 11,470
Karma: 13095790
Join Date: Aug 2007
Location: Grass Valley, CA
Device: EB 1150, EZ Reader, Literati, iPad 2 & Air 2, iPhone 7
|
Quote:
Dale |
|
04-18-2014, 04:18 PM | #26 |
Grand Sorcerer
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Oh what a tangled web we weave...
There really is no epub "standard" when all is said and done. There's just one organization's published "specifications" that the rest of the world cherrypicks and deviates from as they see fit. It's a glorified wishlist... and it'll never really be anything but. |
04-18-2014, 04:22 PM | #27 |
Software Developer
Posts: 189
Karma: 89000
Join Date: Jan 2014
Location: Germany
Device: PocketBook Touch Lux 3
|
As far as I know, there's at least no commonly accepted Calibre EPUB standard, so I stick with the IDPF and W3C, which both represent company entities as well as individuals and groups.
|
04-18-2014, 04:29 PM | #28 | ||
Ex-Helpdesk Junkie
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
@skreutzer calibre is actually quite good about xml, it will only convert books into strictly xhtml-compliant code. So since the problem with the <u> tag has nothing to do with conformance to xml, but s rather an html and by extension an html-as-it-relates-to-xhtml problem, can we stop dragging xml workflow into it? Even though it is a cause you care about?
Quote:
Quote:
Last edited by eschwartz; 04-18-2014 at 04:33 PM. |
||
04-18-2014, 04:39 PM | #29 |
Software Developer
Posts: 189
Karma: 89000
Join Date: Jan 2014
Location: Germany
Device: PocketBook Touch Lux 3
|
Just think about this issue as follows: EPUB gets processed with XML tools. XML tools use XML schema validation. The validation fails for an invalid EPUB input file. What are web services supposed to do in such a case? The initial post about <u> invalidity didn't mention if this issue was raised by a web service or processing tool or if the user just did a validation voluntarily, but at least he posted the issue and asked for a solution. Can he solve it in Calibre? Manually or automatically?
Last edited by skreutzer; 04-18-2014 at 04:44 PM. |
04-18-2014, 04:46 PM | #30 |
Grand Sorcerer
Posts: 27,546
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
|
Oh, and for the record; Sigil (which calibre-edit was intended to replace) also added u tags to the code when using the toolbar. It didn't even pass it's own Flightcrew validation--and for the life of me--I can't remember the uproar over it. People need to stop acting like calibre-edit shot someone's dog here.
Last edited by DiapDealer; 04-18-2014 at 05:02 PM. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Sigil 0.7.2 is deprecated? | Gregg Bell | Sigil | 12 | 04-01-2014 01:17 PM |
Deprecated Plugins | BetterRed | Plugins | 2 | 03-17-2014 05:41 AM |
--output-format deprecated? | Robotech_Master | Calibre | 6 | 01-03-2011 10:07 PM |
LrfAppender: IE toolbar similar to "Toolbar for Librie" | mumurik | LRF | 1 | 10-13-2009 04:01 AM |
LrfAppender: IE toolbar similar to "Toolbar for Librie" | mumurik | Sony Reader Dev Corner | 1 | 10-06-2009 03:33 AM |