View Single Post
Old 06-16-2011, 10:37 PM   #21
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
@theducks, he means python code classes. ;-)

@burbleburble, if you want to join the HTML before processing it then look at the htmlz output format, it has code to join HTML files. I don't know that it updates the manifest for you, you may still need to implement that. Anyway I think you'll find your task easier to not join the files. If the file was originally split in the wrong place the user can use htmlz conversion to re-join it, then convert back to ePub using heuristics, structure detection, etc to get the correct split points, then use your tool for further cleanup.

Last edited by ldolse; 06-16-2011 at 10:42 PM.
ldolse is offline   Reply With Quote