View Single Post
Old 10-30-2013, 08:13 AM   #6
lucis_lupinum
Member
lucis_lupinum began at the beginning.
 
Posts: 18
Karma: 10
Join Date: Oct 2013
Device: Kindle
Yeah that's clear to me, but how should I identify the urls I want? They will be different for every article. The one thing that is always the same, is just the element they are in..
I don't get it - sorry
I looked into other recipes where this method is used, but they were all different and I didn't really know how to use transfer them :-S
And: should I return 'True' or another value, because in most of the mentioned recipes something different is returned...:

Ciekawostki Historyczne:
Code:
def is_link_wanted(self, url, tag):
        return 'ciekawostkihistoryczne' in url and url[-2] in {'2', '3', '4', '5', '6'}
Forbes:
Code:
    def is_link_wanted(self, url, tag):
        ans = re.match(r'http://.*/[2-9]/', url) is not None
        if ans:
            self.log('Following multipage link: %s'%url)
        return ans
hackernews:
Code:
    def is_link_wanted(self, url, tag):
        if url.endswith('.pdf'):
            return False
        return True
and the one for Kopalnia Wiedzy:
Code:
    def is_link_wanted(self, url, tag):
        return tag['class'] == 'next'
lucis_lupinum is offline   Reply With Quote