Quote:
Originally Posted by somedayson
Here's my latest attempt...still can't exclude the junk above and below the articles. Tried all the pages of web pages a few pages early on this, but don't quite have it.
After about three hours on this total, I'd just love the answer if someone is willing to throw me a bone. I know I'm close...
|
Your print_version isn't running. It needs to be indented to run. You don't need the keep_only_tags. Try this:
Spoiler:
Code:
class AdvancedUserRecipe1284145178(BasicNewsRecipe):
title = u'Blackhawks Headlines'
__author__ = 'Starson17'
oldest_article = 7
max_articles_per_feed = 100
remove_javascript = True
remove_empty_feeds = True
feeds = [(u'Blackhawks Recent Headlines', u'http://blackhawks.nhl.com/rss/news.xml')]
def print_version(self, url):
main1, replace1, end1 = url.partition('news.htm?')
url = main1 + 'newsprint.htm?' + end1
main2, middle2, end2 = url.partition('&')
return main2
extra_css = '''
.headline{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
#newsBody{font-family:Helvetica,Arial,sans-serif;font-size:small;text-indent:2em;}
'''
It should be close. (I threw in some basic formatting.)