Need help fixing a custom recipe for the SLC Tribune. The recipe somewhat works, but about every third news article is garbage. Also, when I tried to add the Technology section, it did not pull any of the articles. Not sure why not as the other sections work. The recipe is given below. Many thanks!!
SLTRIB RECIPE:
Code:
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1278347258(BasicNewsRecipe):
title = u'Salt Lake City Tribune'
__author__ = 'Charles Holbert'
oldest_article = 1
max_articles_per_feed = 100
description = '''Utah's independent news source since 1871'''
publisher = 'http://www.sltrib.com/'
category = 'news, Utah, SLC'
language = 'en'
encoding = 'utf-8'
remove_javascript = True
use_embedded_content = False
no_stylesheets = True
remove_tags = [dict(name='div',attrs={'id':['teaser','adCol', 'keywordStories']})
,dict(name='div',attrs={'class':'tripleWide datos'})]
keep_only_tags = [dict(name='div',attrs={'class':'theImage'})
,dict(name='div',attrs={'id':'topImageCaption'})
,dict(name='div',attrs={'class':'theHeadline entry-title'})
,dict(name='div',attrs={'class':'byline'})
,dict(name='div',attrs={'id':'storytext'})]
feeds = [(u'SL Tribune Today', u'http://www.sltrib.com/csp/cms/sites/sltrib/RSS/rss.csp?cat=All'),
(u'Utah News', u'http://www.sltrib.com/csp/cms/sites/sltrib/RSS/rss.csp?cat=UtahNews'),
(u'Business News', u'http://www.sltrib.com/csp/cms/sites/sltrib/RSS/rss.csp?cat=Money'),
(u'Most Popular', u'http://www.sltrib.com/csp/cms/sites/sltrib/RSS/rsspopular.csp'),
(u'Sports', u'http://www.sltrib.com/csp/cms/sites/sltrib/RSS/rss.csp?cat=Sports')]
extra_css = '''
.theHeadline{font-family:Arial,Helvetica,sans-serif; font-size:xx-large; font-weight: bold; color:#0E5398;}
.byline{font-family:Arial,Helvetica,sans-serif; color:#333333; font-size:xx-small;}
.storytext{font-family:Arial,Helvetica,sans-serif; font-size:medium;}
.articleText{font-family:Arial,Helvetica,sans-serif; font-size:medium;}
.caption{font-family:Arial,Helvetica,sans-serif; font-size:xx-small; margin-bottom: 1em;}
'''