Why put deprecated element on toolbar? - Page 2

kovidgoyal · 04-18-2014, 09:50 AM

The world has moved on from XML. You need to stop clinging to XML in your workflows. A serious publishing tool, as you put it, should deal with actual, real-world HTML and produce code that is tested to work in actual real world software, rather than a DTD/spec that is a guarantee of precisely nothing.

A tag is a much nicer construct than which is what the epub 2 spec would require here. Not only is it nicer, it actually works everywhere and has the added advantage of allowing for easy customization of styles via CSS overrides. All a user has to do is add u { text-decoration: none} to turn off underlines. Which cannot be done with the style sttribute.

Sometimes you have to look beyond the spec and implement a solution that actually is superior than what t he spec recommends.

skreutzer · 04-18-2014, 11:17 AM

That's quite funny, it seems you don't have the faintest clue what you are talking about...XML is the universal format for structuring any kind of data which is text-based, and there are lots of powerful tools available to read and write XML files in all kinds of XML-based formats. This way, programmers don't need to write parsers over and over again, just because the technical foundation of a text-based file is some kind of custom markup syntax. Actually, the world is moving more and more towards XML-based formats, which you might even observe with XHTML and EPUB. You might recall that it started out with an even more complex syntactical foundation for structuring text-based files called SGML, from which XML developed as simplified version of it. HTML was initially intended to promote the semantic web and therefore was centered around the concepts of SGML and XML, but then crappy HTML editors and browser-specific rendering made this vision impossible, so slowly the web is recovering from this mess and aiming towards a semantically readable web again, for which EPUB can be a part of and in which content/display separation with XHTML/HTML5 + CSS play an important role in it.

I just want to point out that scientific publishing is heavily involved in a transformation towards digital publishing workflows based upon XML, that the entire e-book sector evolved around XML, that the web with DOM and AJAX has greatly benefited from XML, and that all publishing in general, digital and/or print, will be done with XML based workflows at some point in time. I guess you're quite familiar with word processor formats like ODT or DOCX, and that those formats are too based upon XML isn't a coincidence. Even Calibre is probably engaged with lots of XML-based input files, right?

I completely agree that isn't a much better solution, because this would distribute the direct formatting of underline all over the document, which is contrary to the concept of semantic markup. Instead, the ideal solution is to not provide an “underline” at all, because that's only specifying the visual appearance of words and sentences instead of their meaning. If an author wants to underline something, he usually does it with an intention in mind, for instance, to underline all important words or those words which are added in comparison to a previous sentence. This intention gets lost if only “underline” is written into the file. Instead, it would be much better to define CSS classes “important” and “addition” and leave it up to the display/processing software to make sense out of it (for instance, if all added words should be extracted or removed or displayed in red font), while the associated CSS class may still provide text-decoration:underline; as the default visual appearance.

It might be that Calibre isn't a semantical text editor and users expect an “underline” button which looses the intention a user had while pressing it, as long as it just underlines a marked word, but even if that's the case, a XHTML 1.1 + EPUB2 valid and non-directly-formatting solution still would be with a CSS class .underline { text-decoration: underline; } or something like it. Especially in HTML5, has a different meaning and is advised to be avoided (in favor of more semantically markup like ), so at the moment browsers and e-readers might still render elements as underlined, but that doesn't necessarily have to be the case in the future.

However, I guess Calibre is already making heavy use of CSS classes, so probably you could fix this issue comparatively easily, so that users don't run into validation issues any more, because regardless of what you think about XML validation, there are lots of people who build their workflows with and around XML, so XML validation won't go away any time soon.

I in general would be very interested in discussing semantic markup concepts in general and maybe even for Calibre, but I can imagine that you're actually not interested in it at all. It isn't an absolute requirement if the documents are of basic nature, and only an additional benefit. However, avoiding direct formatting is quite important, because whenever an EPUB gets transformed to another format, the visual definitions of CSS become void while the class names will get matched to their equivalent in the formatting description syntax of the target format.

kovidgoyal · 04-18-2014, 11:21 AM

Christ on a cross. http://www.whatwg.org/specs/web-apps...e/parsing.html

HTML is parsed by a single, well defined, algorithm. XHTML is history.

You go on believing whatever you like, I have neither the time nor the inclination to debate with you.

skreutzer · 04-18-2014, 11:29 AM

How can you be this ignorant about even basic technical concepts? Every file that ever needs to be read, needs to be parsed, regardless of what format it is written in. With XML however, one gets the benefit of using an predefined XML parser instead of re-implementing that algorithm over and over again. Additionally, with validated XML, the parsing algorithm doesn't need to be dump regarding the input file, it can already know all elements that might occur, where they might occur and react to them. Without it, one has to write a lot of extra code for all kinds of nonsense input for which one can't even know what to do with it in terms of rendering or transformation.

Abstraction is of great benefit for programming in general, programming libraries save the developer lots of time from re-implementing the same methods over and over again. XML, HTML, EPUB can be considered as standardized common protocols on which standardized common programming libraries operate on. Violating those protocols equals to breaking the system.

Toxaris · 04-18-2014, 01:01 PM

Quote:

Originally Posted by skreutzer

Actually, the world is moving more and more towards XML-based formats, which you might even observe with XHTML and EPUB.

Err, no. I am an Integration Architect and within application integration XML is used less frequent than in the past. It is a bloated format which can cause serious issues in integration. It has it purposes, but is not the holy grail. You are heavily dependent on the schemes to be correct and well known.
Other formats are on the rise, more loosly and also easier to read. At least in my field the use of XML is declining.

In ePUB there is actually no need for X(HT)ML as opposed to HTML. It makes it more complex and adds nothing.

eschwartz · 04-18-2014, 01:44 PM

Why are we discussing xml at all? @skreutzer, your problem is with calibre having the tag, which isn't about xml it is about html and the epub standard.

A far as xml goes, the editor doesn't push bad xml on you. What are you arguing about, exactly??

DaleDe · 04-18-2014, 03:16 PM

Quote:

Originally Posted by eschwartz

Why are we discussing xml at all? @skreutzer, your problem is with calibre having the tag, which isn't about xml it is about html and the epub standard.

A far as xml goes, the editor doesn't push bad xml on you. What are you arguing about, exactly??

Well ePub2 is really not html but xhtml no matter what extension may be on the files. xhtml is a particular well defined implementation of xml. ePub 2 uses xhtml 1.1 as its basis. The most glaring difference is that xhtml requires closing tags on everything that is not marked specifically as not needing one. needs one and even would need one unless you say . This is good practice in telling the parser what you really mean. Also tags are lower case. While ePub3 is based on html5 it still wants that sort of structure in the code.

Dale

eschwartz · 04-18-2014, 03:56 PM

Quote:

Originally Posted by DaleDe

Well ePub2 is really not html but xhtml no matter what extension may be on the files. xhtml is a particular well defined implementation of xml. ePub 2 uses xhtml 1.1 as its basis. The most glaring difference is that xhtml requires closing tags on everything that is not marked specifically as not needing one. needs one and even would need one unless you say . This is good practice in telling the parser what you really mean. Also tags are lower case. While ePub3 is based on html5 it still wants that sort of structure in the code.

Dale

and therefore....
....
....

The tag gets closed by a tag...

While very informative about the general nature of epub and xhtml, I am sure, I still fail to see what that has to do with being evil according to the epub spec. Also, I knew everything you just said; in no way did I contraindicate it. We are still only dealing with the html aspect of epub/xhtml, since that is what the tag (and any discussion about it within the context of calibre and/or Kovid breaking the holy standards) actually touches on.

skreutzer · 04-18-2014, 04:09 PM

@Toxaris:
What is your impression, are those integration problems caused by proprietary applications and their questionable data formats or by the lack of tools that would translate and interact between XML formats if it were in XML? At least for publishing, there are huge advantages in XML-based workflows, and it is widely used in various ways. What kind of formats are increasing in application integration, binary ones with corresponding APIs to access them?

@eschwartz:
I frequently run into people who have problems with Calibre EPUB files (with reading them or using them for online services), and a portion of those problems are due to the lack of standard conformance, which is absolutely avoidable, but forces users to fix those errors while they usually don't know how. I myself develop processing tools which convert to EPUB, especially to build automated workflows based upon semantic markup. I won't produce invalid output, I'll refuse invalid input, I'll support front ends which are compliant to the various standards (so all conversions in the workflow from the first character typed to the printed hardcover book will be executed over well-defined interfaces). I don't support EPUB2 input yet, and I don't have a problem with since I don't use Calibre (the issue wasn't initially posted by me). However, if file editors are sloppy with their output, a lot of files need to be manually adjusted in the future or are less usable as they could be because visual appearance was encoded instead of meaning.

A developer of EPUB2 or XHTML 1.1 processing or reading software would not only be expected to support all elements that are specified by the standards, but also to react on the old crappy legacy stuff like <marquee> etc. This bloats code for no use at all, and if the task is to just do some primitive transformations, it is far better to ignore all the old, deprecated non-standard constructs which shouldn't be there anyway.

DaleDe · 04-18-2014, 04:15 PM

Quote:

Originally Posted by eschwartz

and therefore....
....
....

The tag gets closed by a tag...

While very informative about the general nature of epub and xhtml, I am sure, I still fail to see what that has to do with being evil according to the epub spec. Also, I knew everything you just said; in no way did I contraindicate it. We are still only dealing with the html aspect of epub/xhtml, since that is what the tag (and any discussion about it within the context of calibre and/or Kovid breaking the holy standards) actually touches on.

Go read the xhtml 1.1 specification and you will see. You can also look at the list of supported tags in our wiki for ePub. I was just commenting on the fact that someone claimed that ePub wasn't XML compliant but it is.

Dale

DiapDealer · 04-18-2014, 04:18 PM

Oh what a tangled web we weave...

There really is no epub "standard" when all is said and done. There's just one organization's published "specifications" that the rest of the world cherrypicks and deviates from as they see fit. It's a glorified wishlist... and it'll never really be anything but.

skreutzer · 04-18-2014, 04:22 PM

As far as I know, there's at least no commonly accepted Calibre EPUB standard, so I stick with the IDPF and W3C, which both represent company entities as well as individuals and groups.

eschwartz · 04-18-2014, 04:29 PM

@skreutzer calibre is actually quite good about xml, it will only convert books into strictly xhtml-compliant code. So since the problem with the tag has nothing to do with conformance to xml, but s rather an html and by extension an html-as-it-relates-to-xhtml problem, can we stop dragging xml workflow into it? Even though it is a cause you care about?

Quote:

Originally Posted by DaleDe

Go read the xhtml 1.1 specification and you will see. You can also look at the list of supported tags in our wiki for ePub. I was just commenting on the fact that someone claimed that ePub wasn't XML compliant but it is.

Dale

I never claimed is a valid part of the epub spec. I simply said it has nothing to do with the xml aspect of it.

Quote:

Originally Posted by DiapDealer

Oh what a tangled web we weave...

There really is no epub "standard" when all is said and done. There's just one organization's published "specifications" that the rest of the world cherrypicks and deviates from as they see fit. It's a glorified wishlist... and it'll never really be anything but.

This is exactly why in real life works for ebooks.

skreutzer · 04-18-2014, 04:39 PM

Just think about this issue as follows: EPUB gets processed with XML tools. XML tools use XML schema validation. The validation fails for an invalid EPUB input file. What are web services supposed to do in such a case? The initial post about invalidity didn't mention if this issue was raised by a web service or processing tool or if the user just did a validation voluntarily, but at least he posted the issue and asked for a solution. Can he solve it in Calibre? Manually or automatically?

DiapDealer · 04-18-2014, 04:46 PM

Oh, and for the record; Sigil (which calibre-edit was intended to replace) also added u tags to the code when using the toolbar. It didn't even pass it's own Flightcrew validation--and for the life of me--I can't remember the uproar over it. People need to stop acting like calibre-edit shot someone's dog here.

04-18-2014, 09:50 AM	#16
kovidgoyal creator of calibre Posts: 43,843 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	The world has moved on from XML. You need to stop clinging to XML in your workflows. A serious publishing tool, as you put it, should deal with actual, real-world HTML and produce code that is tested to work in actual real world software, rather than a DTD/spec that is a guarantee of precisely nothing. A <u> tag is a much nicer construct than <span style="text-decoration:underline"> which is what the epub 2 spec would require here. Not only is it nicer, it actually works everywhere and has the added advantage of allowing for easy customization of styles via CSS overrides. All a user has to do is add u { text-decoration: none} to turn off underlines. Which cannot be done with the style sttribute. Sometimes you have to look beyond the spec and implement a solution that actually is superior than what t he spec recommends.

04-18-2014, 11:17 AM	#17
skreutzer Software Developer Posts: 189 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	That's quite funny, it seems you don't have the faintest clue what you are talking about...XML is the universal format for structuring any kind of data which is text-based, and there are lots of powerful tools available to read and write XML files in all kinds of XML-based formats. This way, programmers don't need to write parsers over and over again, just because the technical foundation of a text-based file is some kind of custom markup syntax. Actually, the world is moving more and more towards XML-based formats, which you might even observe with XHTML and EPUB. You might recall that it started out with an even more complex syntactical foundation for structuring text-based files called SGML, from which XML developed as simplified version of it. HTML was initially intended to promote the semantic web and therefore was centered around the concepts of SGML and XML, but then crappy HTML editors and browser-specific rendering made this vision impossible, so slowly the web is recovering from this mess and aiming towards a semantically readable web again, for which EPUB can be a part of and in which content/display separation with XHTML/HTML5 + CSS play an important role in it. I just want to point out that scientific publishing is heavily involved in a transformation towards digital publishing workflows based upon XML, that the entire e-book sector evolved around XML, that the web with DOM and AJAX has greatly benefited from XML, and that all publishing in general, digital and/or print, will be done with XML based workflows at some point in time. I guess you're quite familiar with word processor formats like ODT or DOCX, and that those formats are too based upon XML isn't a coincidence. Even Calibre is probably engaged with lots of XML-based input files, right? I completely agree that <span style="text-decoration:underline"> isn't a much better solution, because this would distribute the direct formatting of underline all over the document, which is contrary to the concept of semantic markup. Instead, the ideal solution is to not provide an “underline” at all, because that's only specifying the visual appearance of words and sentences instead of their meaning. If an author wants to underline something, he usually does it with an intention in mind, for instance, to underline all important words or those words which are added in comparison to a previous sentence. This intention gets lost if only “underline” is written into the file. Instead, it would be much better to define CSS classes “important” and “addition” and leave it up to the display/processing software to make sense out of it (for instance, if all added words should be extracted or removed or displayed in red font), while the associated CSS class may still provide text-decoration:underline; as the default visual appearance. It might be that Calibre isn't a semantical text editor and users expect an “underline” button which looses the intention a user had while pressing it, as long as it just underlines a marked word, but even if that's the case, a XHTML 1.1 + EPUB2 valid and non-directly-formatting solution still would be <span class="underline"> with a CSS class .underline { text-decoration: underline; } or something like it. Especially in HTML5, <u> has a different meaning and is advised to be avoided (in favor of more semantically markup like <em>), so at the moment browsers and e-readers might still render <u> elements as underlined, but that doesn't necessarily have to be the case in the future. However, I guess Calibre is already making heavy use of CSS classes, so probably you could fix this issue comparatively easily, so that users don't run into validation issues any more, because regardless of what you think about XML validation, there are lots of people who build their workflows with and around XML, so XML validation won't go away any time soon. I in general would be very interested in discussing semantic markup concepts in general and maybe even for Calibre, but I can imagine that you're actually not interested in it at all. It isn't an absolute requirement if the documents are of basic nature, and only an additional benefit. However, avoiding direct formatting is quite important, because whenever an EPUB gets transformed to another format, the visual definitions of CSS become void while the class names will get matched to their equivalent in the formatting description syntax of the target format.

04-18-2014, 11:29 AM	#19
skreutzer Software Developer Posts: 189 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	How can you be this ignorant about even basic technical concepts? Every file that ever needs to be read, needs to be parsed, regardless of what format it is written in. With XML however, one gets the benefit of using an predefined XML parser instead of re-implementing that algorithm over and over again. Additionally, with validated XML, the parsing algorithm doesn't need to be dump regarding the input file, it can already know all elements that might occur, where they might occur and react to them. Without it, one has to write a lot of extra code for all kinds of nonsense input for which one can't even know what to do with it in terms of rendering or transformation. Abstraction is of great benefit for programming in general, programming libraries save the developer lots of time from re-implementing the same methods over and over again. XML, HTML, EPUB can be considered as standardized common protocols on which standardized common programming libraries operate on. Violating those protocols equals to breaking the system. Last edited by skreutzer; 04-18-2014 at 11:39 AM.

04-18-2014, 01:44 PM	#21
eschwartz Ex-Helpdesk Junkie Posts: 19,422 Karma: 85397180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	Why are we discussing xml at all? @skreutzer, your problem is with calibre having the <u> tag, which isn't about xml it is about html and the epub standard. A far as xml goes, the editor doesn't push bad xml on you. What are you arguing about, exactly??

04-18-2014, 04:09 PM	#24
skreutzer Software Developer Posts: 189 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	@Toxaris: What is your impression, are those integration problems caused by proprietary applications and their questionable data formats or by the lack of tools that would translate and interact between XML formats if it were in XML? At least for publishing, there are huge advantages in XML-based workflows, and it is widely used in various ways. What kind of formats are increasing in application integration, binary ones with corresponding APIs to access them? @eschwartz: I frequently run into people who have problems with Calibre EPUB files (with reading them or using them for online services), and a portion of those problems are due to the lack of standard conformance, which is absolutely avoidable, but forces users to fix those errors while they usually don't know how. I myself develop processing tools which convert to EPUB, especially to build automated workflows based upon semantic markup. I won't produce invalid output, I'll refuse invalid input, I'll support front ends which are compliant to the various standards (so all conversions in the workflow from the first character typed to the printed hardcover book will be executed over well-defined interfaces). I don't support EPUB2 input yet, and I don't have a problem with <u> since I don't use Calibre (the issue wasn't initially posted by me). However, if file editors are sloppy with their output, a lot of files need to be manually adjusted in the future or are less usable as they could be because visual appearance was encoded instead of meaning. A developer of EPUB2 or XHTML 1.1 processing or reading software would not only be expected to support all elements that are specified by the standards, but also to react on the old crappy legacy stuff like <marquee> etc. This bloats code for no use at all, and if the task is to just do some primitive transformations, it is far better to ignore all the old, deprecated non-standard constructs which shouldn't be there anyway.

04-18-2014, 11:21 AM	#18
kovidgoyal creator of calibre Posts: 43,843 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Christ on a cross. http://www.whatwg.org/specs/web-apps...e/parsing.html HTML is parsed by a single, well defined, algorithm. XHTML is history. You go on believing whatever you like, I have neither the time nor the inclination to debate with you.

04-18-2014, 04:18 PM	#26
DiapDealer Grand Sorcerer Posts: 27,546 Karma: 193191846 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	Oh what a tangled web we weave... There really is no epub "standard" when all is said and done. There's just one organization's published "specifications" that the rest of the world cherrypicks and deviates from as they see fit. It's a glorified wishlist... and it'll never really be anything but.

04-18-2014, 04:22 PM	#27
skreutzer Software Developer Posts: 189 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	As far as I know, there's at least no commonly accepted Calibre EPUB standard, so I stick with the IDPF and W3C, which both represent company entities as well as individuals and groups.

04-18-2014, 04:39 PM	#29
skreutzer Software Developer Posts: 189 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	Just think about this issue as follows: EPUB gets processed with XML tools. XML tools use XML schema validation. The validation fails for an invalid EPUB input file. What are web services supposed to do in such a case? The initial post about <u> invalidity didn't mention if this issue was raised by a web service or processing tool or if the user just did a validation voluntarily, but at least he posted the issue and asked for a solution. Can he solve it in Calibre? Manually or automatically? Last edited by skreutzer; 04-18-2014 at 04:44 PM.

04-18-2014, 04:46 PM	#30
DiapDealer Grand Sorcerer Posts: 27,546 Karma: 193191846 Join Date: Jan 2010 Device: Nexus 7, Kindle Fire HD	Oh, and for the record; Sigil (which calibre-edit was intended to replace) also added u tags to the code when using the toolbar. It didn't even pass it's own Flightcrew validation--and for the life of me--I can't remember the uproar over it. People need to stop acting like calibre-edit shot someone's dog here. Last edited by DiapDealer; 04-18-2014 at 05:02 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Sigil 0.7.2 is deprecated?	Gregg Bell	Sigil	12	04-01-2014 01:17 PM
Deprecated Plugins	BetterRed	Plugins	2	03-17-2014 05:41 AM
--output-format deprecated?	Robotech_Master	Calibre	6	01-03-2011 10:07 PM
LrfAppender: IE toolbar similar to "Toolbar for Librie"	mumurik	LRF	1	10-13-2009 04:01 AM
LrfAppender: IE toolbar similar to "Toolbar for Librie"	mumurik	Sony Reader Dev Corner	1	10-06-2009 03:33 AM

Advert

Advert