MobileRead Forums - View Single Post - Golem.de (german tech news) multipage article

lucis_lupinum · 10-30-2013, 09:13 AM

Yeah that's clear to me, but how should I identify the urls I want? They will be different for every article. The one thing that is always the same, is just the element they are in..
I don't get it - sorry

I looked into other recipes where this method is used, but they were all different and I didn't really know how to use transfer them :-S
And: should I return 'True' or another value, because in most of the mentioned recipes something different is returned...:

Ciekawostki Historyczne:

Code:

def is_link_wanted(self, url, tag):
        return 'ciekawostkihistoryczne' in url and url[-2] in {'2', '3', '4', '5', '6'}

Forbes:

Code:

    def is_link_wanted(self, url, tag):
        ans = re.match(r'http://.*/[2-9]/', url) is not None
        if ans:
            self.log('Following multipage link: %s'%url)
        return ans

hackernews:

Code:

    def is_link_wanted(self, url, tag):
        if url.endswith('.pdf'):
            return False
        return True

and the one for Kopalnia Wiedzy:

Code:

    def is_link_wanted(self, url, tag):
        return tag['class'] == 'next'

10-30-2013, 09:13 AM	#6
lucis_lupinum Member Posts: 18 Karma: 10 Join Date: Oct 2013 Device: Kindle	Yeah that's clear to me, but how should I identify the urls I want? They will be different for every article. The one thing that is always the same, is just the element they are in.. I don't get it - sorry I looked into other recipes where this method is used, but they were all different and I didn't really know how to use transfer them :-S And: should I return 'True' or another value, because in most of the mentioned recipes something different is returned...: Ciekawostki Historyczne: Code: def is_link_wanted(self, url, tag): return 'ciekawostkihistoryczne' in url and url[-2] in {'2', '3', '4', '5', '6'} Forbes: Code: def is_link_wanted(self, url, tag): ans = re.match(r'http://./[2-9]/', url) is not None if ans: self.log('Following multipage link: %s'%url) return ans hackernews: Code: def is_link_wanted(self, url, tag): if url.endswith('.pdf'): return False return True and the one for Kopalnia Wiedzy*: Code: def is_link_wanted(self, url, tag): return tag['class'] == 'next'