I did this to a few conversions the other day. Here's what I did to produce nice, clean books.
1. Use "Calibre" to convert the file to an EPUB. While you're at it, edit the metainfo and insert a cover page.
2. Open the resulting EPUB in "7-Zip". Extract the HTML file(s).
3. If there's more than one HTML file, copy/paste them into a single large file (only the BODY contents of course).
4. Run "HTML Splitter" on the file, splitting on "H1" tags. You now have one HTML file per chapter.
5. Run "Sigil", load the EPUB. Delete the existing HTML files and import your newly created ones in other. Click the "Tools" menu and choose the TOC editor. It'll pick up the H1's and create a nice table of contents.
This sounds like a lot of work but it's fast after the first time you've done it. You do not need to produce separate HTML files per chapter, but I prefer to. It keeps chapters loading quickly.
Of course, this will not work if you do not have "H1" tags indicating your chapters. You can put these in manually if the conversion does not do it for you.
|