View Single Post
Old 09-05-2007, 01:10 PM   #101
DMcCunney
New York Editor
DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.
 
DMcCunney's Avatar
 
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
Quote:
Originally Posted by nekokami View Post
Ah. I was assuming you'd grab a copy of the file. After all, when you look at a website, you're effectively grabbing a copy of the HTML file (or whatever is generated by the script that creates the page, if we're not talking about static pages).
Yes, you are, but the HTML file is not a compressed archive that must be opened and examined. And whether you can get to it at all depends upon the site. Does it require a login/password?

Even if it doesn't, you may not be able to grab the file in a neatly automated manner. Sites use a file called ROBOTS.TXT to specify what a web spider can search and what it shouldn't index. Spiders that ignore ROBOTS.TXT may just get their originating IP address blocked by the site they spider.
______
Dennis
DMcCunney is offline   Reply With Quote