View Single Post
Old 09-10-2010, 04:08 PM   #2682
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by somedayson View Post
Here's my latest attempt...still can't exclude the junk above and below the articles. Tried all the pages of web pages a few pages early on this, but don't quite have it.

Spoiler:
Code:
class AdvancedUserRecipe1284145178(BasicNewsRecipe):
    title          = u'Blackhawks Headlines'
    oldest_article = 7
    max_articles_per_feed = 100

    feeds          = [(u'Blackhawks Recent Headlines', u'http://blackhawks.nhl.com/rss/news.xml')]

def print_version(self, url):
        main1, replace1, end1 = url.partition('news.htm?')
        url = main1 + 'newsprint.htm?' + end1
        main2, middle2, end2 = url.partition('&')
        return main2

        keep_only_tags [dict(name='div', attrs={'class':'newsBody'})]


After about three hours on this total, I'd just love the answer if someone is willing to throw me a bone. I know I'm close...

Your print_version isn't running. It needs to be indented to run. You don't need the keep_only_tags. Try this:

Spoiler:
Code:
class AdvancedUserRecipe1284145178(BasicNewsRecipe):
    title          = u'Blackhawks Headlines'
    __author__          = 'Starson17'
    oldest_article = 7
    max_articles_per_feed = 100
    remove_javascript = True
    remove_empty_feeds  = True

    feeds          = [(u'Blackhawks Recent Headlines', u'http://blackhawks.nhl.com/rss/news.xml')]

    def print_version(self, url):
        main1, replace1, end1 = url.partition('news.htm?')
        url = main1 + 'newsprint.htm?' + end1
        main2, middle2, end2 = url.partition('&')
        return main2

    extra_css = '''
                    .headline{font-family:Arial,Helvetica,sans-serif; font-weight:bold;font-size:large;}
                    h2{font-family:Arial,Helvetica,sans-serif; font-weight:normal;font-size:small;}
                    p{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                    body{font-family:Helvetica,Arial,sans-serif;font-size:small;}
                    #newsBody{font-family:Helvetica,Arial,sans-serif;font-size:small;text-indent:2em;}
		'''


It should be close. (I threw in some basic formatting.)

Last edited by Starson17; 09-10-2010 at 04:12 PM.
Starson17 is offline