Work in progress
Thanks for your ongoing interest despite my being very late on my schedule.
What happened is I decided to redesign the specifications so as to be able to accommodate any HTML files as long as it has been converted (i.e. from .doc, .odt, ...) by Open Office which meant dropping requirement on HTML tagging and providing the required data through command line options, which meant adding command line options, then added still more options and ended up with a tool badly in need of a GUI.
So, it's still a script but also has a GUI module to a) define the project data (files, options) b) save that data as a "project file" and the files to a "project folder" c) launch the script, which can still be used as a standalone command line tool, and report its progress and completion. The HTML->XHTML->ePUB code has also been fully rewritten to facilitate latter maintenance.
Add the fact that my phone line and--access to the Internet and to much needed documentation--was down for a month due to a storm: development thus took longer than expected. At the moment I have:
1) the GUI
2) the ePUB production process
and their interaction
(almost, it's a matter of days now to fix non-critical bugs) working on both Unix and Windows (I've built up to 10Mbytes ePUB files and tested them on my new Kobo)
next steps are:
1) thoroughly testing a) every value for ePUB options b) correct working of error handling and error messages
2) "plugging-in" the PDF composition code (strange as it may seems, this should be the easiest part: that part of the code has a longer history, has already been tested, and should not require much changes)
3) updating the documentation, which is time consuming task, as English is not my native language.
So, work is well on it's way, but with spring coming, I'll have outdoor occupations as well and its pace will slow a bit. As I don't want to give out an unfinished job, not wanting to develop and maintain at the same time, it unfortunately might take, let's say (being optimist) a couple more months.
As for building a PDF from ePUB, here's how the tool work :
a) it converts HTML or XHTML files + internal or external CSS to clean XHTML files, on a one to one basis
b) it builds ePUB for the converted XHTML files
c) it builds ConTeXt source code from the same XHTML files
d) it calls ConTeXt (TeX based, as LaTeX) to compose them into a PDF file
steps b), c) and d) are "on demand"; converted XHTML+CSS+required graphics and fonts, from step a) can be either archived--and thus opened with a browser--or deleted as well as the ConTeXt files from step c)
There are almost no requirements on HTML tags in the input (no <DIV> save if <BODY> as parent, and that's all as far as I remember) but CSSs are restricted to tag {..} and tag.class {...} rules + @page and @font-face rules. "th p" and "td p" rules are also understood, but that's all. This is what you find in Open Office Writer HTML files internal CSS. There are also non-blocking--i.e., what is not understood is discarded--requirements on attributes/properties which comes from the need to translate them to PDF layouts equivalents.
So, basically, unzipping the ePUB files, using its XHTML, CSS, JPeG and PNG files as input and doing nothing but composing a PDF document will work if the above requirements are met. ePUB to PDF "out-of-the-box" could of course be added (mostly a matter of extracting files to a temporary folder), but I'd rather wait until the tool 1) is ready to be "released" 2) is found easy enough to use 3) and find out to what use it is put. The basic idea being of course to provide a free self-publishing tool which requires but knowledge of either HTML or a word processor and produces both reasonable "quality" ePUB and PDF, but software often ends up being used for other means than originally intended.
Please be patient. When the tool is "out", it'll be easier and I'll be more than willing to discuss and work on design of existing and new features, options, etc. In the meanwhile, I'll give notice on work progress from time to time.
|