MobileRead Forums - View Single Post - Automated Processing Workflows as and with Free Software

skreutzer · 01-30-2014, 07:25 AM

@DaleDe: Thanks for the moving, I was quite unsure where this topic belongs to.

@roger64: Yes, I do exactly the same as st_albert described (using writer2latex as standalone), but not as direct ODT to EPUB conversion, instead, for ODT to XHTML to EPUB conversion, since there could be a lot of things one might to do to the XHTML, including XML validation, adding, removing or moving parts of the document, writing a log of spelling and grammar mistakes or inserting soft-hyphens etc. Further, with XTHML to EPUB as separate, indipendent step, an ODT source file isn't required, other XHTML exporting software or websites can be used as well. Furthermore, instead of mixing ODT and EPUB, LaTeX and every other future output format (which is caused by the goal of the writer2latex package to represent the ODT as displayed in OpenOffice/LibreOffice WYSIWYG visual representation), other front ends and back ends could be added and customized at any time without the need of adjusting all of the other parts of the processing workflow. I ensure that ODT to XHTML to EPUB works at present, and I hopefully will be able to ensure this in the future, too, while helping out to set up the environment needed for such conversions. Currently I'm working on a shell script to automate the conversion process from ODT to XHTML to EPUB, and after this I'm planning to build some tool to manage book projects for this shell script (adding projects, manage metadata, run conversions in bulk), and after that I'm planning to build a tool to manage the setup of the workflow environment. Things like converting multiple ODTs to a single EPUB could then be added relatively easily.

@dickloraine: Other tools can be integrated at any time. However, for myself, I won't put time into integrating a tool which isn't free software or adds unnecessary dependencies, while still you for yourself can do so or other people can provide support and help for proprietary, restrictive, non-free tools, if they like - even if it is a pretty bad idea anyway. Some time ago I already experimented with pandoc, which is quite usable, but witten in Haskell, so my fear was (and is) that this could make it hard for other developers and the community to fully take advantage of it. But it is definitively an option, yes, and it can be integrated as long as there aren't more advanced converters available to the processing workflow. AsciiDoc isn't an option, because nowadays it is technically not reasonable to support other plain text formats than XML for processing automation, since XML is widely supported in most programming languages, while every custom plain text format has to be parsed separately. So custom plain text formats only play a role when converted to XML, or as target formats which are not intended to be processed any further (like *.tex). AsciiDoc therefore could be a front end for text editing, because AsciiDoc files can be converted to XML/XHTML. As you mentioned that command line tools are not user friendly, you have to keep in mind that command line tools are in most cases the only tools which are automatable, while GUI tools are often not. So in general it is a good idea to maintain a tool as command line tool in order to keep automatibility, and develop a GUI on top of it to make it user friendly, just as LyX does for LaTeX. For your LaTeX setup example, you should consider that installing LaTeX on a free operating system is just one click, because software package management is an integral part of free software practices, while on non-free, proprietary operating systems still there are LaTeX ports available, where their setups need a few clicks instead of just one, but LaTeX there too is easy to install.

Some time ago I discovered river-valley.tv and I'm revisiting to look through the videos. I'll pick some of them who describe the theoretical background and actual implementations of automated processing workflows, which are quite common in the scientific and commercial context, so probably we might want something similar as free software for self-publishers or new online publishing services. I could come up with a lot of ideas what could be implemented, but just as a hint: an automated processing workflow could take the mobileread.com RSS feed URL (or any URL that links to websites or parts of websites) as input and convert the latest posts you haven't read yet to EPUB, sorted by date, topic or alphabetically, with or without an option to subscribe to specific topics or forum categories. Or an entire thread could be converted to beautiful PDF, so that conversations can be preserved physically as bound hardcover-book via print-on-demand technology. One already can do so with wget/cURL, and websites or browsers may already provide EPUB and PDF download/export. Still I would like to have universal processing workflows available which aren't specific to a browser as PDF creator or a website's export to EPUB or to websites from the internet, because the very same code would also process XHTML export from a word processor and other sources, so they would be beneficial for a lot more people, not only limited to specialized contexts.

01-30-2014, 07:25 AM	#5
skreutzer Software Developer Posts: 190 Karma: 89000 Join Date: Jan 2014 Location: Germany Device: PocketBook Touch Lux 3	@DaleDe: Thanks for the moving, I was quite unsure where this topic belongs to. @roger64: Yes, I do exactly the same as st_albert described (using writer2latex as standalone), but not as direct ODT to EPUB conversion, instead, for ODT to XHTML to EPUB conversion, since there could be a lot of things one might to do to the XHTML, including XML validation, adding, removing or moving parts of the document, writing a log of spelling and grammar mistakes or inserting soft-hyphens etc. Further, with XTHML to EPUB as separate, indipendent step, an ODT source file isn't required, other XHTML exporting software or websites can be used as well. Furthermore, instead of mixing ODT and EPUB, LaTeX and every other future output format (which is caused by the goal of the writer2latex package to represent the ODT as displayed in OpenOffice/LibreOffice WYSIWYG visual representation), other front ends and back ends could be added and customized at any time without the need of adjusting all of the other parts of the processing workflow. I ensure that ODT to XHTML to EPUB works at present, and I hopefully will be able to ensure this in the future, too, while helping out to set up the environment needed for such conversions. Currently I'm working on a shell script to automate the conversion process from ODT to XHTML to EPUB, and after this I'm planning to build some tool to manage book projects for this shell script (adding projects, manage metadata, run conversions in bulk), and after that I'm planning to build a tool to manage the setup of the workflow environment. Things like converting multiple ODTs to a single EPUB could then be added relatively easily. @dickloraine: Other tools can be integrated at any time. However, for myself, I won't put time into integrating a tool which isn't free software or adds unnecessary dependencies, while still you for yourself can do so or other people can provide support and help for proprietary, restrictive, non-free tools, if they like - even if it is a pretty bad idea anyway. Some time ago I already experimented with pandoc, which is quite usable, but witten in Haskell, so my fear was (and is) that this could make it hard for other developers and the community to fully take advantage of it. But it is definitively an option, yes, and it can be integrated as long as there aren't more advanced converters available to the processing workflow. AsciiDoc isn't an option, because nowadays it is technically not reasonable to support other plain text formats than XML for processing automation, since XML is widely supported in most programming languages, while every custom plain text format has to be parsed separately. So custom plain text formats only play a role when converted to XML, or as target formats which are not intended to be processed any further (like .tex). AsciiDoc therefore could be a front end for text editing, because AsciiDoc files can be converted to XML/XHTML. As you mentioned that command line tools are not user friendly, you have to keep in mind that command line tools are in most cases the only tools which are automatable, while GUI tools are often not. So in general it is a good idea to maintain a tool as command line tool in order to keep automatibility, and develop a GUI on top of it to make it user friendly, just as LyX does for LaTeX. For your LaTeX setup example, you should consider that installing LaTeX on a free operating system is just one click, because software package management is an integral part of free software practices, while on non-free, proprietary operating systems still there are LaTeX ports available, where their setups need a few clicks instead of just one, but LaTeX there too is easy to install. Some time ago I discovered river-valley.tv and I'm revisiting to look through the videos. I'll pick some of them who describe the theoretical background and actual implementations of automated processing workflows, which are quite common in the scientific and commercial context, so probably we might want something similar as free software for self-publishers or new online publishing services. I could come up with a lot of ideas what could be implemented, but just as a hint: an automated processing workflow could take the mobileread.com RSS feed URL (or any URL that links to websites or parts of websites) as input and convert the latest posts you haven't read yet to EPUB, sorted by date, topic or alphabetically, with or without an option to subscribe to specific topics or forum categories. Or an entire thread could be converted to beautiful PDF, so that conversations can be preserved physically as bound hardcover-book via print-on-demand technology. One already can do so with wget/cURL, and websites or browsers may already provide EPUB and PDF download/export. Still I would like to have universal processing workflows available which aren't specific to a browser as PDF creator or a website's export to EPUB or to websites from the internet, because the very same code would also process XHTML export from a word processor and other sources, so they would be beneficial for a lot more people, not only limited to specialized contexts. Last edited by skreutzer; 01-30-2014 at 10:34 AM.*