Originally Posted by zelda_pinwheel
why are 1 and 2 incompatible, kovid ?
If you accept aribtrary HTML as input and want to output standards compliant HTML the only way to do that is to basically strip the HTML down to a basic internal markup and then re-export it. This is for example what BookDesigner does. There is no way you can accept arbitrary HTML input and losslessly convert it to standards compliant HTML output (and no htmltidy doesn't do this).
So really what the tool will have to do is:
1) Accept html input
2) parse the html input into some simple internal markup
3) Try to auto identify structural components (or ask the user to provide input to help identify them)
4) Provide an editor interface for the internal markup
5) Export the internal markup to EPUB