View Single Post
Old 08-03-2015, 04:44 PM   #3
mikebw
Member
mikebw began at the beginning.
 
Posts: 22
Karma: 10
Join Date: Nov 2014
Device: none
Thank you, that worked perfectly!

By overriding 'get_article_url' it is easy to inspect the 'title' in the 'article' passed into it, and then to scan that for the presence or absence of a regex (here simply "Lovecraft") and decide whether or not to retrieve.

This probably would have been clearer to write the test in an affirmative sense -- that is, if the regex is present retrieve the article, else do not retrieve -- but I did it this way instead because I needed to develop it by trial and error to see how to retrieve all articles before attempting to write an if-then test that skipped some.

Despite the excellent quality of the source code examples in Calibre, my knowledge of Python is close to non-existent and I have to look up everything in the documentation.

Here is the code as tested and working, which may be useful as an example to someone:

Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
from __future__ import unicode_literals, division, absolute_import, print_function
from calibre.web.feeds.news import BasicNewsRecipe

class ProJoLovecraft(BasicNewsRecipe):
    title          = 'ProJo Lovecraft'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup   = True
    recursion      = 1

    def print_version(self, url):
        return url + '&template=printart'

    def get_article_url(self, article):
        ans = article.get('title', None)
        if(None == re.search(r'Lovecraft', ans)):
            return None
        else:
            return article.get('link', None)
                         
    feeds          = [
        ('Lovecraft', 'http://www.providencejournal.com/entertainment/books?template=rss&mime=xml'),
    ]
mikebw is offline   Reply With Quote