View Single Post
Old 05-24-2009, 07:52 AM   #530
Derry
Member
Derry began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Aug 2008
Device: Sony PRS505
IHT

For those looking for a Global NYTimes recipe, here is my attempt,
there are a couple of problems, too many blank pages at end of article, and it doesn't grab the second page etc of longer articles, tried url replace, opening browser etc, but I don't understand enough to get it working properly, still might be of use to some people,

Derry

Code:
class AdvancedUserRecipe1241195948(BasicNewsRecipe):
    title          = u'IHT/Global NYT'
    oldest_article = 1
    max_articles_per_feed = 10
    remove_tags_before = dict(id='article')
    remove_tags_after  = dict(id='article')
    remove_tags = [dict(attrs={'class':['articleTools', 'post-tools', 'side_tool', 'nextArticleLink clearfix']}), 
                   dict(id=['footer', 'toolsRight', 'articleInline', 'navigation', 'archive', 'side_search', 'blog_sidebar', 'side_tool', 'side_index']), 
                   dict(name=['script', 'noscript', 'style'])]
    encoding = 'cp1252'
    no_stylesheets = True
    extra_css = 'h1 {font: sans-serif large;}\n.byline {font:monospace;}'
    feeds          = [(u'Frontpage', u'http://www.nytimes.com/services/xml/rss/nyt/GlobalHome.xml'), (u'Europe', u'http://www.nytimes.com/services/xml/rss/nyt/Europe.xml'), (u'Americas', u'http://www.nytimes.com/services/xml/rss/nyt/Americas.xml'), (u'Africa', u'http://www.nytimes.com/services/xml/rss/nyt/Africa.xml'), (u'Asia Pacific', u'http://www.nytimes.com/services/xml/rss/nyt/AsiaPacific.xml'), (u'Middle East', u'http://www.nytimes.com/services/xml/rss/nyt/MiddleEast.xml'),(u'Opinion', u'http://www.nytimes.com/services/xml/rss/nyt/GlobalOpinion.xml'), (u'Business', u'http://www.nytimes.com/services/xml/rss/nyt/WorldBusiness.xml'), (u'Technology', u'http://feeds.nytimes.com/nyt/rss/Technology'), (u'Sports', u'http://www.nytimes.com/services/xml/rss/nyt/GlobalSports.xml'), (u'Science', u'http://www.nytimes.com/services/xml/rss/nyt/Science.xml'), (u'Environment', u'http://www.nytimes.com/services/xml/rss/nyt/Environment.xml'), (u'Health', u'http://www.nytimes.com/services/xml/rss/nyt/Health.xml'), (u'Arts', u'http://www.nytimes.com/services/xml/rss/nyt/Arts.xml'), (u'Travel', u'http://www.nytimes.com/services/xml/rss/nyt/Travel.xml')]
Derry is offline