View Single Post
Old 10-22-2010, 04:46 PM   #10
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by oecherprinte View Post
I downloaded the site using htttack. Then I used calibre to read to index.html file. Afterwards I downloaded the index.html to the kindle.
I'm surprised you didn't do a conversion from html to something else.

Quote:
The result did not look nice and the kindle got somehow stuck when reading the generated book. I.e. it went back to the main menu. So something went wrong. However, I can't see in calibre where I have any control about the conversion.
You have a great deal of control during conversion in Calibre. You can use header/footer removal to eliminate things. You can control TOC with XPATH, control breaks, etc. If you go to EPUB, you can always use Sigil to clean things up. Regardless, the first control you have is the control from the download/scraper that builds the index.html. One of the reasons I like wget is the control it gives me over the download.

Quote:
In Mobipocket Creator I can add all html files by dragging them into the tool. I can set html tags for creating a table of contents and the conversion to prc format is pretty quick. The results looked much better than in calibre and did not crash.
OK. I'm not trying to recommend one or the other - whatever works best/easiest is what you should use, but I suspect it's mostly dependent on the starting html, and that's mostly dependent on the format of the site you are scraping and the tool you use to do the scrape.

I'm glad it worked for you.
Starson17 is offline   Reply With Quote