MobileRead Forums - View Single Post - Sigil as front end for automated XML based processing workflows?

Hitch · 01-07-2014, 03:29 AM

Quote:

Originally Posted by skreutzer

Well, yes! Basically, it would be enough to disable direct font and color selection (and other stuff that is direct formatting), and instead provide predefined styles, which could be changed and applied to the text easily, also imported and exported. If this isn't going to happen with the current word processors, there is still the alternative that advanced writer software like Scrivener would provide such a feature, since it would improve their very own typesetting facility (currently manual work via direct formatting word processor methodology) and directly benefit the self-publisher. Even without it, Sigil could be used as specialized software for the encoding task by people who do e-book and print book preparation in volume. Sigil already provides XML based output that is usable for all kinds of target formats. Sigil won't, for instance, have to deal with PDF creation, EPUB would be sufficient to feed into a processing system that would generate the PDF from it. Yes, the crucial point is to get valid, semantic XML, and the author himself or the writing software of the author should take care of it. If neither of those two does, a tool would be very valuable to strip down an input file to the pure text (probably still showing the visual appearance of the original) and allow semantic encoding to the person who is producing the target formats. This tool could be Sigil or a separate tool, creating valid XHTML input for Sigil, so Sigil would be the backend to create an EPUB. Alternatively, the software of the writer would initially take care of semantic encoding. Hopefully, writer and output producer talk to each other in order to either save the time of useless direct formatting on the writers side, or save time on the side of the output producer person by getting semantically encoded XML from the writer. If writer and output producer are the same person, the software should educate and require him to do the right thing, to do semantic markup.

I don't know if Sigil could be used as software to create semantic XML from junk input, since semantic XHTMLs in EPUB as Sigil output are usable as XML input for a automated processing workflow, or if a separate tool would me more reasonable, which could provide Sigil with semantic XHTML input in order to create EPUB. Since Sigil as a semantic EPUB editor could also be considered as a semantic XML editor, my main question is if Sigil could be changed to make the process of converting junk input into semantic XML as easy as possible, probably by the approach mentioned above (to throw all direct formatting away and apply style templates). For users who do conversions in volume and self-publishers who are serious about their process, this would be of great value, I guess.

I'm now not really sure if we're saying the same thing, or different things. At this juncture, I don't see any tool, at all, that is assisting in providing clean, properly-formatted XML into Sigil or any other workflow. My comprehension of your posts is that this is what you were considering creating, as it's extremely unlikely that any of the current writing tools on the market, whether Word, Scrivener, etc., are going to go in that direction?

Quote:

I would be personally interested why you think that the XML -> XSLT never took off, which seems to be true, but I wonder if it is for the simple reason that such workflows were always implemented as proprietary, restricted software, and not something you could set up and run on your own computer.

I don't think that's hard; the truth is that you either take a word-processed document, or something that's been through, say, INDD, and you can a) clean it and b) then export it to HTML in order create an ePUB for instant commercial use, and then c) export it into MOBI for commercial use, or b) clean it to create semantic XML in the first place, which then has to be processed again to create an ePUB and/or MOBI. In the former case, you essentially run 1+ processes, in the latter it's 2+ or 3, as creating a mobi from a good ePUB is simplicity itself. I think it's as simple as, XML isn't natively suited for print or faux-print layout, as it's basically Markup. Writers and editors don't write in Markup.

When Amazon came into the marketplace, they bought Mobipocket creator, and the Kindle ran on HTML 3.2. This drove the bookmaking market. I can't say I've done a boatload of XML cleanup, but the XML I've tried to export from Word, to investigate this idea (XML to XSLT) hasn't looked like a party to clean. Moreover, the retailers change their standards and their devices every 5 minutes. No major reader runs on XML; so...I think it was, quite simply, creating a process that would be able to reuse a file, to create other outputs, in a market that is primarily driven by entertainment books, seemed like extra work and an extra step that's unnecessary. PLUS, even if you assume arguendo that it's a good idea, then you have the problem of (say, with Textbooks), trying to export the initial content into a usable form for the author/editor to do an UPDATED version...whereas, with HTML, you can reimport the content easily back into Word or another word-processor for an author and editor to work collaboratively to update the material for a next Edition or updated textbook. Trust me: they are NOT going to sit there over something that looks like an RSS feed or XML and edit it. I think that's a major hurdle, too.

Quote:

Are there other reasons than this? Indeed, the topic looks complex, one needs to know several aspects of XML technology (Schema, XPath, XSLT, etc.), but it also can be build into tools and hidden away from the user. Today, in some programming languages, a XSLT processor is part of the libraries, so you could pack it all together to user friendly software, or use it as separate tools together with some automation scripts on a server in order to provide an online service or your own processing environment to generate output for people you work for/with.

If you say so. I admit, I've not seen anything that looks remotely user-friendly to which I could point my clients. And as I said somewhere in this thread, a major converter of books in India just invested a ton of money to invent/develop a system by which XML could be displayed in a Word-like, browser interface in order to provide a collaborative environment for textbook revisers/editors to work in. I'd have thought that if the environment existed, they wouldn't have spent all that money to create it, specifically for one client. I know someone else on this very forum considering creating a markup editor at one point in time; I don't know what happened with that.

Quote:

As example, I did XSLTs and a shell script to produce an EPUB from custom XML called hag2epub2, but now I did html2epub written in Java, which could have used the very same XSLTs if hag2epub2 weren't written for custom XML input. html2epub however, is a single tool without external dependencies except a Java VM, while hag2epub2 uses several external tools like a zip tool and XMLStarlet. html2epub is easier to use for more people, but hag2epub2 is easier to change and allows more flexibility.

Yes, but again: all of those, every single one, all depend on the cleaned, ready-to-go XML being prepared and ready. I see that as the huge stumbling block, myself. For commercial users, it would have to be as simple, and as easy, as "simply" exporting and cleaning to HTML/XHTML, and it would have to be something that we could convince our users that they want, and are willing to pay for. THAT would also be a fairly big block; convincing them that they want a cleaned XML file that they themselves likely won't ever open or use, or even foresee a need for. But, I could be wrong.

It's just that I could swear that I remember, way back, Josh Tallent being big on the whole XML-->XSLT-->DocBook Idea, and I'm pretty sure if it made any type of economic sense, then, he would have pursued it heavily. That he didn't is telling, at least, as a commercial producer. And I don't see XML--> XSLT--> etc. being a big seller for private users or DIY authors, but again: I could be utterly wrong.

I'm surprised more of the gang haven't chimed in here.???

Hitch