MobileRead Forums - View Single Post

hapax legomenon · 11-02-2008, 08:18 PM

the issue mainly is, what do you do with "found html" over which you have no control?

I'm talking about static html sites where there may not exist a clean TOC.

Are there any ways to autogenerate this kind of TOC? I come across hundreds of static html sites which I'd like to translate into a portable ebook format. In most cases, I just cut and paste the text, but I lose out on that.

The workflows described here assume the formatter has some control over what kind of html pages he has to deal with.

I guess HTML Tidy can clean up the code, maybe you can use XSLT to remove javascript and then autogenerate a TOC based on the title or H1 tag.

Are there any automated tools for doing this?

for one thing, I have a Sony PRS-505, and I don't know how to make ebooks out of a dozen html files (using Calibre for instance).

11-02-2008, 08:18 PM	#3
hapax legomenon Erotica Writer Posts: 102 Karma: 106 Join Date: Jul 2007 Location: Tulsa, OK Device: ipad, Sony Reader PRS 505, Cybook 3	the issue mainly is, what do you do with "found html" over which you have no control? I'm talking about static html sites where there may not exist a clean TOC. Are there any ways to autogenerate this kind of TOC? I come across hundreds of static html sites which I'd like to translate into a portable ebook format. In most cases, I just cut and paste the text, but I lose out on that. The workflows described here assume the formatter has some control over what kind of html pages he has to deal with. I guess HTML Tidy can clean up the code, maybe you can use XSLT to remove javascript and then autogenerate a TOC based on the title or H1 tag. Are there any automated tools for doing this? for one thing, I have a Sony PRS-505, and I don't know how to make ebooks out of a dozen html files (using Calibre for instance).