Yeah that's clear to me, but how should I identify the urls I want? They will be different for every article. The one thing that is always the same, is just the element they are in..
I don't get it - sorry

I looked into other recipes where this method is used, but they were all different and I didn't really know how to use transfer them :-S
And: should I return 'True' or another value, because in most of the mentioned recipes something different is returned...:
Ciekawostki Historyczne:
Code:
def is_link_wanted(self, url, tag):
return 'ciekawostkihistoryczne' in url and url[-2] in {'2', '3', '4', '5', '6'}
Forbes:
Code:
def is_link_wanted(self, url, tag):
ans = re.match(r'http://.*/[2-9]/', url) is not None
if ans:
self.log('Following multipage link: %s'%url)
return ans
hackernews:
Code:
def is_link_wanted(self, url, tag):
if url.endswith('.pdf'):
return False
return True
and the one for
Kopalnia Wiedzy:
Code:
def is_link_wanted(self, url, tag):
return tag['class'] == 'next'