Quote:
Originally Posted by DarkElf
Is there a way to perform this sobstitution? Perhaps just in the skip_ad_pages method itself?
|
Perhaps, but I'd need to dig deeper than I have time for.
I've never needed skip_ad_pages, so I'm not familiar with it, and it's only used in two other builtin recipes I know of. I'm a bit surprised that you are finding this, as I would have expected it to have been seen in those other recipes.
FYI, here's the code for those other recipes:
Spoiler:
Code:
def skip_ad_pages(self, soup):
# Skip ad pages served before actual article
skip_tag = soup.find(True, {'name':'skip'})
if skip_tag is not None:
self.log.warn("Found forwarding link: %s" % skip_tag.parent['href'])
url = 'http://www.nytimes.com' + re.sub(r'\?.*', '', skip_tag.parent['href'])
url += '?pagewanted=all'
self.log.warn("Skipping ad to article at '%s'" % url)
return self.index_to_soup(url, raw=True)
Code:
def skip_ad_pages(self, soup):
# Skip ad pages served before actual article
skip_tag = soup.find(name='img', attrs={'alt':'Cyanide and Happiness, a daily webcomic'})
if skip_tag is None:
return soup
return None
Whatever solution you find, post it here.