View Single Post
Old 11-02-2011, 06:48 PM   #6
nickredding
onlinenewsreader.net
nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'nickredding knows the difference between 'who' and 'whom'
 
Posts: 320
Karma: 10143
Join Date: Dec 2009
Location: Phoenix, AZ & Victoria, BC
Device: Kindle 3, Kindle Fire, IPad3, iPhone4, Playbook, HTC Inspire
Quote:
Originally Posted by awitko View Post
I reran the recipe to get a copy again. The articles show a short portion of the article and then an image that states subscriber only content...Alex
I contributed an earlier incarnation of the recipe that had a variable omit_paid_content which, if set to True, would skip the paid content articles. However, the recipe has been rewritten since then and that customization was removed (reason unknown).

You can create a custom version of the WSJ (free) recipe that omits the paid articles by adding the following code to the standard recipe
Code:
    
def preprocess_html(self,soup):
     article_title = self.tag_to_string(soup.title)
     # check if article is paid content
     divtag = soup.find('div','adSummary subscribePromo recipeNotABCShopAndBuy')
     if divtag:
         self.log("\nPaid article omitted (%s)" % article_title)
         return None
     return soup
nickredding is offline   Reply With Quote