Since I have a little experience writing converters, I'd just like to say that if somebody does write a new improved gutenberg to html converter to use a well defined semantic scheme by CSS classes. This would make the HTML much more suited to conversion into a ebook format like epub or LRF.
Some important things to have in the generated HTML would be
1. A meta tag identifying the type of file (i.e. identifying it as the output of that automatic converter). This is necessary for parsing the semantic information.
2. CSS classes for things like page breaks, chapter titles, chapter subtitles, inline vs. block vs full page images.
3. Use of semantic HTML tags like <em>, <strong> instead of <bold> and <i>
etc.
|