View Single Post
Old 11-17-2010, 11:35 AM   #3
Nexus
Member
Nexus began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Nov 2010
Location: France
Device: PRS-600
Thanks somedayson, for whatever reason the RSS link I tried the other day for THN didn't work. It's working fine now. I've worked a bit on and I got some results, except that I have big time difficulties to remove tags that come after the article, I tried "stuff" like remove_tags_after/before etc, with no success unfortunately.

Here's the recipe, not looking pretty I know, but that's all I could come up with with my knowledge. You have no idea how long it took me...

Code:
class AdvancedUserRecipe1289990851(BasicNewsRecipe):
    title          = u'THE HOCKEY NEWS'
    oldest_article = 7
    max_articles_per_feed = 5
    no_stylesheets = True
    remove_tags = [dict(name='div', attrs={'class':'article_info'}),
                            dict(name='div', attrs={'class':'photo_details'}),
                            dict(name='div', attrs={'id':'comments_container'}),		  
                            dict(name='div', attrs={'id':'add_comment'}),
			    dict(name='div', attrs={'id':'legal_info'}),
		            dict(name='div', attrs={'id':'breadcrumb'}),
   			    dict(name='div', attrs={'id':'site_header'}),
			    dict(name='div', attrs={'id':'site_navigation'}),
			    dict(name='div', attrs={'id':'advertisement'}),
			    dict(name='div', attrs={'class':'tool_menu'})]
				   
    feeds          = [(u'THN', u'http://www.thehockeynews.com/rss/all_categories.xml')]
Nexus is offline   Reply With Quote