MobileRead Forums - View Single Post - would like a recipe to pull down a free online book

nrapallo · 10-07-2010, 04:44 PM

Quote:

Originally Posted by Starson17

I started with HTTrack, but found some things it wouldn't do well for me. It's been a long time, and perhaps it has been updated, but I switched to wget and have been happy with it. I use it daily/hourly/weekly to automatically grab certain files for my wife on those sites that want you to come back each day/hour/week for something free.

I think that's a fair assessment when the website is "troublesome" i.e. doesn't stay on the same URL path and/or goes off-domain with it's hyperlinks.

The aforementioned MIT Press website book was very well constructed and "behaved nicely" when being spidered so I didn't have much to worry about when using HTTrack. Using wget should also not have any issues.

There are some tricks/techniques I employ when dealing with a "poorly linked website" for spidering purposes, but they usually get used ONCE and then the project is spidered/over.

For some websites that I've spidered and converted to ebooks, in the past, see the bottom of this thread.