Hi,
I've given up on coming up with a good recipe for MIT's Technology Review at:
http://www.technologyreview.com/
Questions:
Some of the reports at MIT Technology Review are split across multiple pages. How do you deal with that?
In the middle of some of the articles, a line stating "Story continues below" occurs along with an advertisement. How do I cut that out?
The site has a print option for each article, but it uses the article id number in each of the print URLs. How would I deal with that?
I hope someone will improve my recipe and post it here so that I can see how to solve the problems I ran into.
Thanks...
XG
My recipe follows:
from calibre.web.feeds.news import BasicNewsRecipe
class MITtechnologyReview(BasicNewsRecipe):
title = u'MIT Technology Review'
__author__ = u'Xanthan Gum'
description = 'Technology news from MIT'
no_stylesheets = True
remove_tags_before = dict(id='articlebody')
remove_tags_after = dict(name='h3')
oldest_article = 7
max_articles_per_feed = 100
feeds = [(u'Top Stories', u'http://feeds.technologyreview.com/technology_review_top_stories'),
(u'Computing', u'http://feeds.technologyreview.com/technology_review_Computing'),
(u'Web', u'http://feeds.technologyreview.com/technology_review_Web'),
(u'Communications', u'http://feeds.technologyreview.com/technology_review_Communications'),
(u'Energy', u'http://feeds.technologyreview.com/technology_review_Energy'),
(u'Materials', u'http://feeds.technologyreview.com/technology_review_Materials'),
(u'Biomedicine', u'http://feeds.technologyreview.com/technology_review_Biotech'),
(u'Business', u'http://feeds.technologyreview.com/technology_review_Biztech')]