@theducks, he means python code classes. ;-)
@burbleburble, if you want to join the HTML before processing it then look at the htmlz output format, it has code to join HTML files. I don't know that it updates the manifest for you, you may still need to implement that. Anyway I think you'll find your task easier to not join the files. If the file was originally split in the wrong place the user can use htmlz conversion to re-join it, then convert back to ePub using heuristics, structure detection, etc to get the correct split points, then use your tool for further cleanup.
Last edited by ldolse; 06-16-2011 at 10:42 PM.