View Single Post
Old 11-02-2008, 08:18 PM   #3
hapax legomenon
Erotica Writer
hapax legomenon doesn't litterhapax legomenon doesn't litter
 
hapax legomenon's Avatar
 
Posts: 102
Karma: 106
Join Date: Jul 2007
Location: Tulsa, OK
Device: ipad, Sony Reader PRS 505, Cybook 3
the issue mainly is, what do you do with "found html" over which you have no control?

I'm talking about static html sites where there may not exist a clean TOC.

Are there any ways to autogenerate this kind of TOC? I come across hundreds of static html sites which I'd like to translate into a portable ebook format. In most cases, I just cut and paste the text, but I lose out on that.

The workflows described here assume the formatter has some control over what kind of html pages he has to deal with.

I guess HTML Tidy can clean up the code, maybe you can use XSLT to remove javascript and then autogenerate a TOC based on the title or H1 tag.

Are there any automated tools for doing this?

for one thing, I have a Sony PRS-505, and I don't know how to make ebooks out of a dozen html files (using Calibre for instance).
hapax legomenon is offline   Reply With Quote