Quote:
Originally Posted by Starson17
As to the "Photo" issue - you want to skip articles that have that text in the link. I only know one way to do that. Perhaps someone else knows another. Basically, I know two ways to follow articles - to follow all the links in the automatically parsed feed, or to build your own feed (without the Photo links) with parse_index and then follow all of those links.
If there's another way - to follow some links, but not others, I don't know it.
|
I thought of "another way."
Use print_version.
Try this code:
Code:
def print_version(self, url):
match = re.search(r'PhotoGallery', url)
if not match:
return url
Print_version operates between the parsing and the fetching. The code above will check to see if "PhotoGallery" is in the URL and skip that article if it is.