View Single Post
Old 01-23-2009, 02:41 PM   #14
mtravellerh
book creator
mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.mtravellerh ought to be getting tired of karma fortunes by now.
 
mtravellerh's Avatar
 
Posts: 9,613
Karma: 1620342
Join Date: Oct 2008
Location: Luxembourg
Device: PB360
Quote:
Originally Posted by kovidgoyal View Post
If you accept aribtrary HTML as input and want to output standards compliant HTML the only way to do that is to basically strip the HTML down to a basic internal markup and then re-export it. This is for example what BookDesigner does. There is no way you can accept arbitrary HTML input and losslessly convert it to standards compliant HTML output (and no htmltidy doesn't do this).

So really what the tool will have to do is:

1) Accept html input
2) parse the html input into some simple internal markup
3) Try to auto identify structural components (or ask the user to provide input to help identify them)
4) Provide an editor interface for the internal markup
5) Export the internal markup to EPUB
If you do that (5), people like Coolmicro will get up and shout again that the resulting epub is not conform to standard and that the html code is not "clean". (I really do not care about "clean or dirty" code myself, as long as it does what it has to do, like Calibre does for example). So I am all for it.
mtravellerh is offline   Reply With Quote