View Single Post
Old 10-02-2012, 11:05 AM   #14
silver18
THE NOOB
silver18 ought to be getting tired of karma fortunes by now.silver18 ought to be getting tired of karma fortunes by now.silver18 ought to be getting tired of karma fortunes by now.silver18 ought to be getting tired of karma fortunes by now.silver18 ought to be getting tired of karma fortunes by now.silver18 ought to be getting tired of karma fortunes by now.silver18 ought to be getting tired of karma fortunes by now.silver18 ought to be getting tired of karma fortunes by now.silver18 ought to be getting tired of karma fortunes by now.silver18 ought to be getting tired of karma fortunes by now.silver18 ought to be getting tired of karma fortunes by now.
 
silver18's Avatar
 
Posts: 701
Karma: 1545649
Join Date: Jan 2012
Location: Italy
Device: Kindle Touch 5.3.2
Ok, here I am...
The problem when "wgetting" a page from wikipedia is that you need to turn on recursive download as the normal "--page-requisites" option doesn't retrieve images (it gets only the html to be honest).
This is because images are hosted on a separate page!
BUT turning on "--recursive" gives you LOTS of things to download (as wiki pages are full of external & internal links).

So I ended up using --recursive + --span-hosts + --domains.
This is the complete command:
Code:
wget --recursive --span-hosts --domains=upload.wikimedia.org -e robots=off --random-wait --limit-rate=20K --page-requisites --no-parent --convert-links --adjust-extension --restrict-file-names=windows -U Mozilla -P "$folder" "$webpage"
Now images are retrived but I can't get the style sheet...
I'm going to investigate where the hell is it hosted...
silver18 is offline   Reply With Quote