View Single Post
Old 01-09-2012, 12:05 AM   #4
bburky
Junior Member
bburky began at the beginning.
 
Posts: 2
Karma: 12
Join Date: Jan 2012
Device: Kindle 4
The URL structure of the RSS feeds changed slightly for Tulsa World. Instead of http://www.tulsaworld.com/site/rss.aspx?group=1 it is now http://www.tulsaworld.com/site/rss/rss.aspx?group=1

The http://www.tulsaworld.com/site/rss/ page lists all the RSS feeds for Tulsa World.

Here's a new version of the recipe with the new rss feeds included. Also I've included the subcategories of feeds and commented them out. You may select which feeds you want.

Spoiler:
Code:
__license__   = 'GPL v3'
__copyright__ = '2010, Darko Miletic <darko.miletic at gmail.com>'
'''
tulsaworld.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class TulsaWorld(BasicNewsRecipe):
    title                 = 'Tulsa World'
    __author__            = 'Darko Miletic'
    description           = 'Find breaking news, local news, Oklahoma weather, sports, business, entertainment, lifestyle, opinion, government, movies, books, jobs, education, blogs, video & multimedia.'
    publisher             = 'World Publishing Co.'
    category              = 'Tulsa World, tulsa world, daily newspaper, breaking news, stories, articles, news, local, weather, coverage, editorial, government, education, community, sports, business, entertainment, lifestyle, opinion, multimedia, media, blogs, consumer, OU, OSU, TU, ORU, football, basketball, school, schools, sudoku, movie reviews, stocks, classified ads, classifieds, books, job, jobs, careers, real estate, home, homes, Oklahoma, northeastern, reviews, auto, autos, archives, forecasts, Sooners, Cowboys, Hurricane, Golden Eagles, NFL, NBA, MLB, pro football, scores, college basketball, college football, college baseball, sports columns, fashion and style, associated press, regional news coverage, health, obituaries, politics, political news, Jenks, Union, Owasso, Tulsa, Booker T. Washington, Trojans, Rams, Hornets, video, photography, photos, images, games, search, the picker, predictions, satellite, family, food, teens, polls, births, celebrations, death notices, divorces, marriages, obituaries, audio, podcasts.'
    oldest_article        = 2
    max_articles_per_feed = 200
    no_stylesheets        = True
    encoding              = 'utf8'
    use_embedded_content  = False
    language              = 'en'
    country               = 'US'
    remove_empty_feeds    = True
    masthead_url          = 'http://www.tulsaworld.com/images/TW_logo-blue-footer.jpg'
    extra_css             = ' body{font-family: Arial,Verdana,sans-serif } img{margin-bottom: 0.4em} .articleHeadline{font-size: xx-large; font-weight: bold} .articleKicker{font-size: x-large; font-weight: bold} .articleByline,.articleDate{font-size: small} .leadp{font-size: 1.1em} '

    conversion_options = {
                          'comment'          : description
                        , 'tags'             : category
                        , 'publisher'        : publisher
                        , 'language'         : language
                        , 'linearize_tables' : True
                        }
    keep_only_tags = [dict(name='div',attrs={'id':['ctl00_body1_ArticleControl_divArticleText','ctl00_BodyContent_ArticleControl_divArticleText']})]

    feeds = [
        # The first feed of each category is an aggregation of the subcategories that follow
        # For example, "News" contains "Local", "State", "Legal", etc.

        ### News
        (u'News', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=1'),
        #(u'Local', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=11'),
        #(u'State', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=12'),
        #(u'Legal', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=14'),
        #(u'Consumer Awareness', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=15'),
        #(u'Government', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=16'),
        #(u'Health &amp; Fitness', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=17'),
        #(u'Religion', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=18'),
        #(u'Education', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=19'),
        #(u'Jay Cronley', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=206'),
        #(u'SemGroup', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=351'),
        #(u'Inhofe', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=447'),
        #(u'Coburn', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=448'),
        #(u'Sullivan', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=449'),

        ### Sports
        (u'Sports', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=2'),
        #(u'OU', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=92'),
        #(u'OSU', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=93'),
        #(u'TU', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=94'),
        #(u'ORU', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=95'),
        #(u'Dave Sittler', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=202'),
        #(u'John Klein', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=203'),
        #(u'The Picker', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=204'),
        #(u'High School Football', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=227'),
        #(u'Boys Basketball', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=230'),
        #(u'College Football', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=231'),
        #(u'College Basketball', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=234'),

        ### Scene
        (u'Scene', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=4'),
        #(u'Food', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=39'),
        #(u'Home &amp; Garden', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=41'),
        #(u'People', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=42'),
        #(u'Style', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=43'),
        #(u'Celebrations', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=59'),
        #(u'Scott Cherry', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=207'),
        #(u'Jason Ashley Wright', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=208'),
        #(u'Column - Walker', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=209'),
        #(u'Garden', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=517'),

        ### Business
        (u'Business', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=5'),
        #(u'Tech', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=52'),

        ### Transitions
        #(u'Transitions', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=6'),
        #(u'Births', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=55'),
        #(u'Obits: Death Notices', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=56'),
        #(u'Divorces', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=57'),
        #(u'Obits: Obituaries (News Obits)', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=58'),
        #(u'Marriages', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=60'),

        ### Opinion
        (u'Opinion', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=7'),
        #(u'Letters to the Editor', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=62'),
        #(u'Political Cartoon', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=63'),
        #(u'Janet Pearson', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=211'),
        #(u'Column - Jones', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=213'),
        #(u'Julie Delcour', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=214'),
        #(u'David Averill', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=215'),

        ### Community
        (u'Community', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=9'),


        ### Blog Feeds
        # No combined category feeds
        # These are untested. They will likely not work at all
        # Also some of these links appear to not work correctly

        # NOT WORKING:

        ### Sports Blogs
        #(u'Mike Strain', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=8&blog=1'),
        #(u'John Klein', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=15&blog=1'),
        #(u'Dave Sittler', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=13&blog=1'),
        #(u'Jimmie Tramel', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=9&blog=1'),
        #(u'Bill Haisten', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=16&blog=1'),
        #(u'The Picker', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=14&blog=1'),
        #(u'OSU Cowboys', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=11&blog=1'),
        #(u'OU Sooners', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=12&blog=1'),
        #(u'TU Golden Hurricane', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=10&blog=1'),
        #(u'ORU Golden Eagles', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=17&blog=1'),
        #(u'High School', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=26&blog=1'),

        ### Lifestyle Blogs
        #(u'Scott Cherry', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=21&blog=1'),
        #(u'Natalie Mikles', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=22&blog=1'),
        #(u'Michael Smith', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=5&blog=1'),
        #(u'Jason Ashley Wright', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=3&blog=1'),
        #(u'Jennifer Chancellor', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=29&blog=1'),

        ### Opinion Blogs
        #(u'Mike Jones', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=31&blog=1'),
        #(u'Wayne Greene', u'http://www.tulsaworld.com/site/rss/rss.aspx?group=30&blog=1')
    ]

    def get_article_url(self, article):
        return article.get('link',  None).rpartition('&rss')[0]

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        return self.adeify_images(soup)


Also, there are various blogs by columnists with RSS feeds listed. I included them, but I believe they will not work. Those pages are significantly different from the rest of the site and I'm not sure the article text will work. You're welcome to check though.

I only slightly tested this. It should work fine though.
bburky is offline   Reply With Quote