View Single Post
Old 02-19-2011, 09:17 PM   #1
clintiepoo
Member
clintiepoo began at the beginning.
 
Posts: 19
Karma: 10
Join Date: Feb 2011
Device: kindle
Very new to this - please help me parse a local newspaper's RSS

Hi,

I'm trying to work on the Herald and Review (herald-review.com). I don't know Python, so I'm starting with the Science Daily recipe and modifying it. Here's what I have so far:

Code:
#!/usr/bin/env  python


'''
http://www.herald-review.com
'''
from calibre.web.feeds.news import BasicNewsRecipe

class DecaturHerald(BasicNewsRecipe):
    title                 = u'Herald and Review'
    __author__            = u'Clint'
    description           = u"Decatur, IL Newspaper"
    oldest_article        = 7
    language = 'en'

    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    
    cover_url = 'http://www.herald-review.com/content/tncms/live/global/resources/images/hr_logo.jpg'
    
    keep_only_tags = [ 
                        dict(name='h1'),
                        dict(name='span', attrs={'class':'updated'}),
                        dict(name='img', attrs={'id':'img-holder'}),                        
                        dict(name='div', attrs={'id':'blox-story-text'}) 
                     ]
           
                     
    feeds       = [ 

                    (u'Local Business ', u'http://www.herald-review.com/search/?f=rss&c[]=business/local&sd=desc&s=start_time')

                    ]
Some problems I have:

The title shows up twice, once as a link. I'm not sure how to fix this.
The picture and the date are on the same line.

Any help is appreciated. This is probably really easy, but I'm not seeing it.
clintiepoo is offline   Reply With Quote