Quote:
Originally Posted by square4761
remove_tags = [
dict(name=['table', 'iframe', 'embed', 'object'])
]
remove_tags_after = dict(name='div', attrs={'class':'article_body'})
feeds = [(u'http://rss.townhall.com/blogs/main'),
(u'http://rss.townhall.com/columnists/all')
]
def print_version(self, url):
return url + '?page=full'
|
First, It is bad etiquette not to mention just plain wrong to publish someone else's name and email to the web. Please take a minute to edit the above post and remove same.
Second, I looked in my working area and I had a recipe just about complete for the columnists but the blogs eluded me because they use java to print the blog entries. If you replace the above with the code below you will be in the ball park for the columnists feed.
I lost interest in it so when you manage to get it working take credit and submit it for others to use. I attached the favicon for the site that you can add to the zip file when you upload it here.
Good Luck.
Code:
keep_only_tags = [
dict(name='div', attrs={'class':'authorblock'}),
dict(name='div', attrs={'id':'columnBody'})
]
remove_tags_after = dict(name='div', attrs={'id':'columnBody'})
remove_tags = [
dict(name=['iframe', 'img', 'embed', 'object','center','script','form']),
dict(name='div', attrs={'id':['ShareText', 'Externa', 'Toolbox', 'ctl00_cphMain_cbComments_dlComments_ctl01_ctl00_Content', 'ArticleContainer', 'shirttail', 'comments_container', 'ctl00_cphMain_cbComments_dvReadAll', 'footer']})
]
feeds = [(u'TownHall Columnists', u'http://rss.townhall.com/columnists/all')]
def print_version(self, url):
return url + '&page=full'