MobileRead Forums - View Single Post - Sigil as front end for automated XML based processing workflows?

skreutzer · 01-06-2014, 07:44 PM

Well, yes! Basically, it would be enough to disable direct font and color selection (and other stuff that is direct formatting), and instead provide predefined styles, which could be changed and applied to the text easily, also imported and exported. If this isn't going to happen with the current word processors, there is still the alternative that advanced writer software like Scrivener would provide such a feature, since it would improve their very own typesetting facility (currently manual work via direct formatting word processor methodology) and directly benefit the self-publisher. Even without it, Sigil could be used as specialized software for the encoding task by people who do e-book and print book preparation in volume. Sigil already provides XML based output that is usable for all kinds of target formats. Sigil won't, for instance, have to deal with PDF creation, EPUB would be sufficient to feed into a processing system that would generate the PDF from it. Yes, the crucial point is to get valid, semantic XML, and the author himself or the writing software of the author should take care of it. If neither of those two does, a tool would be very valuable to strip down an input file to the pure text (probably still showing the visual appearance of the original) and allow semantic encoding to the person who is producing the target formats. This tool could be Sigil or a separate tool, creating valid XHTML input for Sigil, so Sigil would be the backend to create an EPUB. Alternatively, the software of the writer would initially take care of semantic encoding. Hopefully, writer and output producer talk to each other in order to either save the time of useless direct formatting on the writers side, or save time on the side of the output producer person by getting semantically encoded XML from the writer. If writer and output producer are the same person, the software should educate and require him to do the right thing, to do semantic markup.

I don't know if Sigil could be used as software to create semantic XML from junk input, since semantic XHTMLs in EPUB as Sigil output are usable as XML input for a automated processing workflow, or if a separate tool would me more reasonable, which could provide Sigil with semantic XHTML input in order to create EPUB. Since Sigil as a semantic EPUB editor could also be considered as a semantic XML editor, my main question is if Sigil could be changed to make the process of converting junk input into semantic XML as easy as possible, probably by the approach mentioned above (to throw all direct formatting away and apply style templates). For users who do conversions in volume and self-publishers who are serious about their process, this would be of great value, I guess.

I would be personally interested why you think that the XML -> XSLT never took off, which seems to be true, but I wonder if it is for the simple reason that such workflows were always implemented as proprietary, restricted software, and not something you could set up and run on your own computer. Are there other reasons than this? Indeed, the topic looks complex, one needs to know several aspects of XML technology (Schema, XPath, XSLT, etc.), but it also can be build into tools and hidden away from the user. Today, in some programming languages, a XSLT processor is part of the libraries, so you could pack it all together to user friendly software, or use it as separate tools together with some automation scripts on a server in order to provide an online service or your own processing environment to generate output for people you work for/with. As example, I did XSLTs and a shell script to produce an EPUB from custom XML called hag2epub2, but now I did html2epub written in Java, which could have used the very same XSLTs if hag2epub2 weren't written for custom XML input. html2epub however, is a single tool without external dependencies except a Java VM, while hag2epub2 uses several external tools like a zip tool and XMLStarlet. html2epub is easier to use for more people, but hag2epub2 is easier to change and allows more flexibility.

01-06-2014, 07:44 PM	#6
skreutzer Software Developer Posts: 189 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	Well, yes! Basically, it would be enough to disable direct font and color selection (and other stuff that is direct formatting), and instead provide predefined styles, which could be changed and applied to the text easily, also imported and exported. If this isn't going to happen with the current word processors, there is still the alternative that advanced writer software like Scrivener would provide such a feature, since it would improve their very own typesetting facility (currently manual work via direct formatting word processor methodology) and directly benefit the self-publisher. Even without it, Sigil could be used as specialized software for the encoding task by people who do e-book and print book preparation in volume. Sigil already provides XML based output that is usable for all kinds of target formats. Sigil won't, for instance, have to deal with PDF creation, EPUB would be sufficient to feed into a processing system that would generate the PDF from it. Yes, the crucial point is to get valid, semantic XML, and the author himself or the writing software of the author should take care of it. If neither of those two does, a tool would be very valuable to strip down an input file to the pure text (probably still showing the visual appearance of the original) and allow semantic encoding to the person who is producing the target formats. This tool could be Sigil or a separate tool, creating valid XHTML input for Sigil, so Sigil would be the backend to create an EPUB. Alternatively, the software of the writer would initially take care of semantic encoding. Hopefully, writer and output producer talk to each other in order to either save the time of useless direct formatting on the writers side, or save time on the side of the output producer person by getting semantically encoded XML from the writer. If writer and output producer are the same person, the software should educate and require him to do the right thing, to do semantic markup. I don't know if Sigil could be used as software to create semantic XML from junk input, since semantic XHTMLs in EPUB as Sigil output are usable as XML input for a automated processing workflow, or if a separate tool would me more reasonable, which could provide Sigil with semantic XHTML input in order to create EPUB. Since Sigil as a semantic EPUB editor could also be considered as a semantic XML editor, my main question is if Sigil could be changed to make the process of converting junk input into semantic XML as easy as possible, probably by the approach mentioned above (to throw all direct formatting away and apply style templates). For users who do conversions in volume and self-publishers who are serious about their process, this would be of great value, I guess. I would be personally interested why you think that the XML -> XSLT never took off, which seems to be true, but I wonder if it is for the simple reason that such workflows were always implemented as proprietary, restricted software, and not something you could set up and run on your own computer. Are there other reasons than this? Indeed, the topic looks complex, one needs to know several aspects of XML technology (Schema, XPath, XSLT, etc.), but it also can be build into tools and hidden away from the user. Today, in some programming languages, a XSLT processor is part of the libraries, so you could pack it all together to user friendly software, or use it as separate tools together with some automation scripts on a server in order to provide an online service or your own processing environment to generate output for people you work for/with. As example, I did XSLTs and a shell script to produce an EPUB from custom XML called hag2epub2, but now I did html2epub written in Java, which could have used the very same XSLTs if hag2epub2 weren't written for custom XML input. html2epub however, is a single tool without external dependencies except a Java VM, while hag2epub2 uses several external tools like a zip tool and XMLStarlet. html2epub is easier to use for more people, but hag2epub2 is easier to change and allows more flexibility. Last edited by skreutzer; 01-06-2014 at 07:54 PM.