MobileRead Forums - View Single Post

Derry · 08-11-2008, 10:13 AM

Hi,
I've been using this great software since the day I got my PRS505, and firstly would like to say thanks for the software.

I've been having a problem with a script to fetch RSS feeds for a while now and was wondering if someone could point out where I'm going wrong.

Originally I was using the script generated by Calibre when you add the feeds:

Code:

class AdvancedUserRecipe1218463759(BasicNewsRecipe):
    title          = u'International Herald Tribune'
    oldest_article = 1
    max_articles_per_feed = 15
    
    feeds          = [(u'Frontpage', u'http://www.iht.com/rss/frontpage.xml'), (u'Business', u'http://www.iht.com/rss/business.xml'), (u'Americas', u'http://www.iht.com/rss/america.xml'), (u'Europe', u'http://www.iht.com/rss/europe.xml'), (u'Asia', u'http://www.iht.com/rss/asia.xml'), (u'Africa and Middle East', u'http://www.iht.com/rss/africa.xml'), (u'Opinion', u'http://www.iht.com/rss/opinion.xml'), (u'Technology', u'http://www.iht.com/rss/technology.xml'), (u'Health and Science', u'http://www.iht.com/rss/healthscience.xml'), (u'Sports', u'http://www.iht.com/rss/sports.xml'), (u'Culture', u'http://www.iht.com/rss/arts.xml'), (u'Style and Design', u'http://www.iht.com/rss/style.xml'), (u'Travel', u'http://www.iht.com/rss/travel.xml'), (u'At Home Abroad', u'http://www.iht.com/rss/athome.xml'), (u'Your Money', u'http://www.iht.com/rss/yourmoney.xml'), (u'Properties', u'http://www.iht.com/rss/properties.xml')]

The problem is that while it pulls down the feeds fine, the fonts in the table of contents are too large, and there are a lot of things that could be cleaned up, so I tried writing a custom script, but every attempt to modify the script causes a problem where it includes photos ads etc and the the script then takes over a half an hour to run as opposed to a few minutes.

Here is the current version of my custom script:

Code:

class InternationalHeraldTribune(BasicNewsRecipe):
    title          = u'The International Herald Tribune'
    __author__     = 'Derry FitzGerald'
    oldest_article = 1
    max_articles_per_feed = 15
    no_stylesheets = True

    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
    extra_css      = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt  }' 

    feeds          = [
                      (u'Frontpage', u'http://www.iht.com/rss/frontpage.xml'), 
                      (u'Business', u'http://www.iht.com/rss/business.xml'),
                      (u'Americas', u'http://www.iht.com/rss/america.xml'),
                      (u'Europe', u'http://www.iht.com/rss/europe.xml'),
                      (u'Asia', u'http://www.iht.com/rss/asia.xml'),
                      (u'Africa and Middle East', u'http://www.iht.com/rss/africa.xml'),
                      (u'Opinion', u'http://www.iht.com/rss/opinion.xml'),
                      (u'Technology', u'http://www.iht.com/rss/technology.xml'),
                      (u'Health and Science', u'http://www.iht.com/rss/healthscience.xml'),
                      (u'Sports', u'http://www.iht.com/rss/sports.xml'),
                      (u'Culture', u'http://www.iht.com/rss/arts.xml'),
                      (u'Style and Design', u'http://www.iht.com/rss/style.xml'),
                      (u'Travel', u'http://www.iht.com/rss/travel.xml'),
                      (u'At Home Abroad', u'http://www.iht.com/rss/athome.xml'),
                      (u'Your Money', u'http://www.iht.com/rss/yourmoney.xml'),
                      (u'Properties', u'http://www.iht.com/rss/properties.xml')
                    ]

Hopefully it is something simple that I'm missing.

Also here is a simple script for the Irish Times which works pretty well and might be of use to someone,

Code:

#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Derry FitzGerald'
'''
irishtimes.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class IrishTimes(BasicNewsRecipe):
    title          = u'The Irish Times'
    __author__     = 'Derry FitzGerald'
    no_stylesheets = True

    remove_tags    = [dict(name='div', attrs={'class':'footer'})]
    extra_css      = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt  }' 

    feeds          = [
                      ('Frontpage', 'http://www.irishtimes.com/feeds/rss/newspaper/index.rss'), 
                      ('Ireland', 'http://www.irishtimes.com/feeds/rss/newspaper/ireland.rss'),
                      ('World', 'http://www.irishtimes.com/feeds/rss/newspaper/world.rss'),
                      ('Finance', 'http://www.irishtimes.com/feeds/rss/newspaper/finance.rss'),
                      ('Features', 'http://www.irishtimes.com/feeds/rss/newspaper/features.rss'),
                      ('Sport', 'http://www.irishtimes.com/feeds/rss/newspaper/sport.rss'),
                      ('Opinion', 'http://www.irishtimes.com/feeds/rss/newspaper/opinion.rss'),
                      ('Letters', 'http://www.irishtimes.com/feeds/rss/newspaper/letters.rss'),
                      ('Health', 'http://www.irishtimes.com/feeds/rss/newspaper/health.rss'),
                      ('Education and Parenting', 'http://www.irishtimes.com/feeds/rss/newspaper/education.rss'),
                      ('Science Today', 'http://www.irishtimes.com/feeds/rss/newspaper/sciencetoday.rss'),
                      ('The Ticket', 'http://www.irishtimes.com/feeds/rss/newspaper/theticket.rss'),
                      ('Weekend', 'http://www.irishtimes.com/feeds/rss/newspaper/weekend.rss'),
                      ('News Features', 'http://www.irishtimes.com/feeds/rss/newspaper/newsfeatures.rss'),
                      ('Magazine', 'http://www.irishtimes.com/feeds/rss/newspaper/magazine.rss'),
                    ]

    def print_version(self, url):
        return url.replace('.html', '_pf.html')

Thanks in advance for any assistance
Derry