05-17-2011, 10:46 AM | #1 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Apr 2011
Device: none
|
Recipe for Self Magazine (US) (need help)
This recipe is almost finish, it need little clean up,
J do mistake with "remove_tags" (it doesn't remove what supposed to do) if someone know how to fix this, feel free to do it ! Code:
class AdvancedUserRecipe1305547242(BasicNewsRecipe): title = u'Self Magazine' oldest_article = 9 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False remove_javascript = True remove_tags = [dict(name='div', attrs={'class':'articles_footer', 'class':'printoptions'})] def print_version(self, url): return url + '?printable=true' def preprocess_html(self, soup): for alink in soup.findAll('a'): if alink.string is not None: tstr = alink.string alink.replaceWith(tstr) return soup feeds = [ (u'Healthy, Happiness, Sex and Love', u'http://www.self.com/services/rss/feeds/health.xml'), (u'Beauty and Style', u'http://www.self.com/services/rss/feeds/beauty.xml'), (u'Self Recipes', u'http://www.self.com/services/rss/feeds/recipes.xml'), (u'Fitness and Workouts', u'http://www.self.com/services/rss/feeds/fitness.xml'), (u'Healthy Stars', u'http://www.self.com/services/rss/feeds/healthystars.xml'), (u'Lucys Blog', u'http://www.self.com/magazine/blogs/lucysblog/rss.xml'), (u'Beyond the Beauty Pages', u'http://www.self.com/beauty/blogs/beyondthebeautypages/rss.xml'), (u'Diet Like Me', u'http://www.self.com/fooddiet/blogs/dietlikeme/rss.xml'), (u'Eat Like Me', u'http://www.self.com/fooddiet/blogs/eatlikeme/rss.xml'), (u'SELF Style Secrets', u'http://www.self.com/beauty/blogs/selfstylesecrets/rss.xml'), (u'SELFy Stars', u'http://www.self.com/magazine/blogs/selfystars/rss.xml'), (u'Healthy SELF', u'http://www.self.com/services/rss/summary'), ] |
05-17-2011, 05:16 PM | #2 |
Enthusiast
Posts: 37
Karma: 10
Join Date: Apr 2011
Device: none
|
J fix this on my own
Code:
class AdvancedUserRecipe1305547242(BasicNewsRecipe): title = u'Self Magazine' oldest_article = 21 max_articles_per_feed = 100 no_stylesheets = True use_embedded_content = False remove_javascript = True keep_only_tags = [dict(id=['printbody'])] def print_version(self, url): return url + '?printable=true' def preprocess_html(self, soup): for alink in soup.findAll('a'): if alink.string is not None: tstr = alink.string alink.replaceWith(tstr) return soup feeds = [ (u'Healthy, Happiness, Sex and Love', u'http://www.self.com/services/rss/feeds/health.xml'), (u'Beauty and Style', u'http://www.self.com/services/rss/feeds/beauty.xml'), (u'Self Recipes', u'http://www.self.com/services/rss/feeds/recipes.xml'), (u'Fitness and Workouts', u'http://www.self.com/services/rss/feeds/fitness.xml'), (u'Healthy Stars', u'http://www.self.com/services/rss/feeds/healthystars.xml'), (u'Lucys Blog', u'http://www.self.com/magazine/blogs/lucysblog/rss.xml'), (u'Beyond the Beauty Pages', u'http://www.self.com/beauty/blogs/beyondthebeautypages/rss.xml'), (u'Diet Like Me', u'http://www.self.com/fooddiet/blogs/dietlikeme/rss.xml'), (u'Eat Like Me', u'http://www.self.com/fooddiet/blogs/eatlikeme/rss.xml'), (u'SELF Style Secrets', u'http://www.self.com/beauty/blogs/selfstylesecrets/rss.xml'), (u'SELFy Stars', u'http://www.self.com/magazine/blogs/selfystars/rss.xml'), (u'Healthy SELF', u'http://www.self.com/services/rss/summary'), ] |
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe for Caijing Magazine (zh-CN) | gzeric | Recipes | 2 | 08-19-2011 04:59 PM |
Recipe Request for World Magazine | fbrian | Recipes | 3 | 06-05-2011 10:10 AM |
SPIN Magazine recipe | Quistopher | Recipes | 0 | 01-27-2011 09:04 PM |
Create recipe for magazine | BlonG | Recipes | 0 | 10-26-2010 07:46 AM |
Custom Recipe for CNBC Magazine | nittecat | Calibre | 1 | 02-28-2010 04:14 AM |