the issue mainly is, what do you do with "found html" over which you have no control?
I'm talking about static html sites where there may not exist a clean TOC.
Are there any ways to autogenerate this kind of TOC? I come across hundreds of static html sites which I'd like to translate into a portable ebook format. In most cases, I just cut and paste the text, but I lose out on that.
The workflows described here assume the formatter has some control over what kind of html pages he has to deal with.
I guess HTML Tidy can clean up the code, maybe you can use XSLT to remove javascript and then autogenerate a TOC based on the title or H1 tag.
Are there any automated tools for doing this?
for one thing, I have a Sony PRS-505, and I don't know how to make ebooks out of a dozen html files (using Calibre for instance).
|