Quote:
Originally Posted by kidblue
Is there an obvious example of the recipe you're describing - One where the page is extracted as opposed to using the browser?
|
Not that I know of, but it's easy to build:
1) let's grab the article url with print_version:
Code:
def print_version(self, url):
print 'print_v url is: ', url
return url
By itself that does nothing, other than print the url of the article.
2) Now, let's turn it into a soup and print that:
Code:
def print_version(self, url):
print 'print_v url is: ', url
soup = self.index_to_soup(url)
print 'The soup is: ', soup
return url
Again, this code does nothing other than print the article page source and the url of that page. You'd need to find what you want in the soup, and return that.
Edit: To be clear, I'm not saying this code will work in your case. If the article links are obfuscated, the browser method is needed.