MobileRead Forums - View Single Post - Sigil as front end for automated XML based processing workflows?

skreutzer · 01-06-2014, 03:42 PM

@st_albert: No, not as primary "word processor". The primary software for authors should be a writing program, with lots of features to help the writing process itself (note taking, timeline and stuff like this). There could also be some initial (semantic!) markup be involved. Such writing software shouldn't deal with typesetting issues and e-book formatting at all. The output would be the input for Sigil, and with Sigil, the author (or the formatter guy of choice) would do additional markup, if it was missing in the first place. The Sigil output would be passed to an automated XML processing system. All three steps could be done by the author himself by using the tools, but also by another person or a service (online or offline). Cleaning up the messes of direct formatting would be eliminated, the file would only get better, if a writer didn't do the markup himself in the first place while writing (some time somebody has to add such structural information anyway, right?).

@Hitch: Yes, XML via XSLT and/or specialized tools to various output formats. Since XHTML is XML and EPUB is a collection of XHTMLs, besides of XML also XHTML and EPUB could be considered as input formats for automated processing workflows. However, those formats could easily be converted amongst each other. XHTML can be used in a very generic way, so that it won't be much different to a non-document specific XML, but a web representation could be "suggested" for the XHTML file by defining the CSS classes.

As long as you have the data in XML or any XML based format (with proper semantic markup of course), it can be used to generate nearly all output formats, be it directly or after intermediate conversion. There's no technical reason why it should be impossible to not convert convert EPUBs "back" for online use as a website, to create PDFs from EPUB or to convert it into custom XML formats. Same in the other direction: from a custom XML format, other custom XML formats, XHTML, EPUB or the input XML format of a processing workflow can be created with relative ease (well, implement it once, use it any time).

For my own use, I've started with a very primitive XHTML to EPUB converter, since I want to reuse texts from my website for e-book distribution platforms. Another project of mine is digitalizing and proofreading an old book (printed in blackletter) into XHTML and generate EPUB and PDF from it. If working, my website XHTML texts could also be converted to PDF, even if I don't plan to do so now. With any of those projects, I want to automatically replicate any changes of the text into all output formats (which is possible with high quality, if the semantic markup is done right - except it is a very special work which is not worth automating). In general, one could come up with a lot of ideas how to interconvert XML based content into the various formats, one of them are "end formats", and others are XML based and can be reused for the processing workflow.

To make full use of such workflows, it is important to get valid, semantic XHTML (or any XML) in the first place. Sigil could be a tool to create such from plain text or horrible pseudo-HTML, so a lot of self-publishers could benefit from the automated processing of their texts. If, like the company in Chennai, the author is enabled to provide a semantic XML/XHTML himself, then e-book publication as well as the printed book via book-on-demand technology could be fully automated, without any human intervention needed (but possible - for instance, to change the book layout the author has specified for his books for a single print run or so).

01-06-2014, 03:42 PM	#4
skreutzer Software Developer Posts: 189 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	@st_albert: No, not as primary "word processor". The primary software for authors should be a writing program, with lots of features to help the writing process itself (note taking, timeline and stuff like this). There could also be some initial (semantic!) markup be involved. Such writing software shouldn't deal with typesetting issues and e-book formatting at all. The output would be the input for Sigil, and with Sigil, the author (or the formatter guy of choice) would do additional markup, if it was missing in the first place. The Sigil output would be passed to an automated XML processing system. All three steps could be done by the author himself by using the tools, but also by another person or a service (online or offline). Cleaning up the messes of direct formatting would be eliminated, the file would only get better, if a writer didn't do the markup himself in the first place while writing (some time somebody has to add such structural information anyway, right?). @Hitch: Yes, XML via XSLT and/or specialized tools to various output formats. Since XHTML is XML and EPUB is a collection of XHTMLs, besides of XML also XHTML and EPUB could be considered as input formats for automated processing workflows. However, those formats could easily be converted amongst each other. XHTML can be used in a very generic way, so that it won't be much different to a non-document specific XML, but a web representation could be "suggested" for the XHTML file by defining the CSS classes. As long as you have the data in XML or any XML based format (with proper semantic markup of course), it can be used to generate nearly all output formats, be it directly or after intermediate conversion. There's no technical reason why it should be impossible to not convert convert EPUBs "back" for online use as a website, to create PDFs from EPUB or to convert it into custom XML formats. Same in the other direction: from a custom XML format, other custom XML formats, XHTML, EPUB or the input XML format of a processing workflow can be created with relative ease (well, implement it once, use it any time). For my own use, I've started with a very primitive XHTML to EPUB converter, since I want to reuse texts from my website for e-book distribution platforms. Another project of mine is digitalizing and proofreading an old book (printed in blackletter) into XHTML and generate EPUB and PDF from it. If working, my website XHTML texts could also be converted to PDF, even if I don't plan to do so now. With any of those projects, I want to automatically replicate any changes of the text into all output formats (which is possible with high quality, if the semantic markup is done right - except it is a very special work which is not worth automating). In general, one could come up with a lot of ideas how to interconvert XML based content into the various formats, one of them are "end formats", and others are XML based and can be reused for the processing workflow. To make full use of such workflows, it is important to get valid, semantic XHTML (or any XML) in the first place. Sigil could be a tool to create such from plain text or horrible pseudo-HTML, so a lot of self-publishers could benefit from the automated processing of their texts. If, like the company in Chennai, the author is enabled to provide a semantic XML/XHTML himself, then e-book publication as well as the printed book via book-on-demand technology could be fully automated, without any human intervention needed (but possible - for instance, to change the book layout the author has specified for his books for a single print run or so).