View Single Post
Old 05-27-2010, 11:36 PM   #1996
kidtwisted
Member
kidtwisted began at the beginning.
 
kidtwisted's Avatar
 
Posts: 16
Karma: 10
Join Date: May 2010
Location: Southern California
Device: JetBook-Lite
Help with recipe - articles span more then 1 page

Hello everyone.

I need some help with a recipe for this feed:
http://www.pcper.com/rss/articles.rss

Most of the articles span several pages, I've cleaned it up a bit but I'm not sure how to scrape the complete article from the "Click here for the Detailed Review" links. Thanks!

Here's what I have so far.
Code:
class AdvancedUserRecipe1274998412(BasicNewsRecipe):
    title = u'PC Perspective  Articles'
    description = 'PC Perspective  Articles'
    __author__ = 'KidTwisted'
    #use_embedded_content   = False
    max_articles_per_feed = 25
    oldest_article = 7
    cover_url      = 'http://www.pcper.com/site_gfx/pcpheader_02.gif'

    no_stylesheets = True
    language = 'en'

    remove_javascript = True
    conversion_options = { 'linearize_tables' : True}
   # reverse_article_order = True

    remove_tags = [dict(name='table', attrs={'class':'topwrapper'}),
                            dict(name='div', attrs={'class':'leftcatimg'}),
                            dict(name='div', attrs={'class':'navcontainer1'}),
                            dict(name='td', attrs={'class':'img3'}),
                            dict(name='div', attrs={'class':'mtbg'}),
                            dict(name='div', attrs={'class':'rightcatimg'}),
                            dict(name='td', attrs={'class':'articlelinks'}),
                            dict(id='navcontainer')]

    remove_tags_after = dict(name='div', attrs={'class':'rightcatimg'})


    feeds =  [ (u'PC Perspective Articles', u'http://www.pcper.com/rss/articles.rss') ]

Last edited by kidtwisted; 05-28-2010 at 01:04 AM.
kidtwisted is offline