Thread: web2lrf
View Single Post
Old 12-04-2007, 12:37 AM   #107
JTravers
Groupie
JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.JTravers ought to be getting tired of karma fortunes by now.
 
Posts: 182
Karma: 1078201
Join Date: Sep 2007
Device: iPad Air 2
Quote:
Originally Posted by kovidgoyal View Post
@JTravers
match_regexp works on the contents of the href attribute, i.e. the URL itself, not on the <a> tag.
Here's the code I'm using for the link regexp:
Code:
match_regexps = ['http://online.barrons.com/.*?html\?mod=.*?']
But I can see webpages being fetched from entirely different domains than barrons.com. I've attached my profile for Barrons. You should be able to test it (at your convenience, of course) without supplying a username and password, as there are some articles that are available to non-subscribers.
Attached Files
File Type: txt barrons.py.txt (3.6 KB, 570 views)
JTravers is offline   Reply With Quote