Quote:
Originally Posted by eschwartz
Good to hear it helped you.
Note: It should work everywhere, so long as the website offers download links.
It might not work in places where content-disposition headers rename the download or redirects are in place -- both lead to downloaded flenames that look like e.g. attachment.php?attachmentid=141344&d=1440341764 and are filtered out because we only accepted PDFs -- or the website uses a robots.txt to forbid bot downloads.
The solution to all these is in advanced wget usage, for instance in my wgetrc (permanent configuration file) I have trust_server_names=on and content_disposition=on and robots=off. You can also pass those options with
Code:
--execute trust_server_names=on --execute content_disposition=on --execute robots=off
|
Hi, I tried it, and had exactly this problem !
[could you give me an example of downloading a site, using this command (I'm a noob ...)]
hmm, I figured out what you meant, I think, I have to make the above changes in the wgetrc file, which I can't seem able to find ...
Thanks in advance !
PS: I am using windows, and downloaded wget from this site:
https://builtvisible.com/download-yo...ite-with-wget/ (first link, under download wget) ...
PPS: I am aslo trying this:
http://www.jensroesner.de/wgetgui/, which is a wgetGUI, probably the noob version of wget, and I'm getting html files, we is already a start, will continue fiddling around ...
PPPS: what I would like to do is download some pdf articles from a journal (newleftreview.org), in a faster way ...