MobileRead Forums - View Single Post - Custom recipes (archive, read-only)

Hamlet53 · 03-15-2010, 11:15 PM

I had requested a recipe for the San Francisco Bay Guardian, and this was included in the latest version release of Calibre. Unfortunately the stock recipe results in the download of only a small part of the total weekly paper. I understand why as at the main RSS page for the SFBG web site the link labeled “Main Site (everything) “ is not that at all [everything]. Using the stock recipe as I guide I have prepared the expanded version here that obtains not everything, but at least a lot more. That is if anyone else is interested.

Spoiler:

03-15-2010, 11:15 PM	#1609
Hamlet53 Nameless Being	Revised SFBG recipe I had requested a recipe for the San Francisco Bay Guardian, and this was included in the latest version release of Calibre. Unfortunately the stock recipe results in the download of only a small part of the total weekly paper. I understand why as at the main RSS page for the SFBG web site the link labeled “Main Site (everything) “ is not that at all [everything]. Using the stock recipe as I guide I have prepared the expanded version here that obtains not everything, but at least a lot more. That is if anyone else is interested. Spoiler: from calibre.web.feeds.news import BasicNewsRecipe class SanFranciscoBayGuardian(BasicNewsRecipe): title = u'San Francisco Bay Guardian' language = 'en' __author__ = 'Krittika Goyal' oldest_article = 31 #days max_articles_per_feed = 25 #encoding = 'latin1' no_stylesheets = True #remove_tags_before = dict(name='div', attrs={'id':'story_header'}) #remove_tags_after = dict(name='div', attrs={'id':'shirttail'}) remove_tags = [ dict(name='iframe'), #dict(name='div', attrs={'class':'related-articles'}), #dict(name='div', attrs={'id':['story_tools', 'toolbox', 'shirttail', 'comment_widget']}), #dict(name='ul', attrs={'class':'article-tools'}), #dict(name='ul', attrs={'id':'story_tabs'}), ] feeds = [ ('sfbg', 'http://www.sfbg.com/rss.xml'), ('politics', 'http://www.sfbg.com/politics/rss.xml'), ('blogs', 'http://www.sfbg.com/blog/rss.xml'), ('pixel_vision', 'http://www.sfbg.com/pixel_vision/rss.xml'), ('bruce', 'http://www.sfbg.com/bruce/rss.xml'), ] #def preprocess_html(self, soup): #story = soup.find(name='div', attrs={'id':'story_body'}) #td = heading.findParent(name='td') #td.extract() #soup = BeautifulSoup('<html><head><title>t</title></head><body></body></html>') #body = soup.find(name='body') #body.insert(0, story) #return soup