MobileRead Forums - View Single Post - Wall Street Journal--feedparser error?

dkfurrow · 10-07-2014, 07:52 AM

Thanks for the reply. Yeah I got similar results for my run...you're right, they must be doing some changes.

The only issue I had was the opinion section still not downloading. I know you put in a fix for that a few weeks ago, and the articles in question contain the tag with "article-contents" id, but for some reason it's not working. I tried a few other combinations of tags for the "keep-only" list, but still couldn't get it to work (except by removing keep_only, which puts too much extra stuff in). A sample parse_index function with two articles (one that works, one that doesn't) is below. Do you get similar results?

Code:

def parse_index(self):
        feeds = []
        articles = []
        # will parse
        title1 = 'HP Article WSJ'
        desc1 = 'about Hewlett Packard'
        url1 = 'http://online.wsj.com/articles/hewlett-packard-split-comes-as-more-investors-say-big-isnt-better-1412643100'
        articles.append({'title':title1, 'url':url1, 'description':desc1, 'date':''})

        # won't parse
        title = "Stephens Article in WSJ"
        desc = 'china bubble story'
        url = 'http://online.wsj.com/articles/bret-stephens-hong-kong-pops-the-china-bubble-1412636585'
        articles.append({'title':title, 'url':url, 'description':desc, 'date':''})


        for article in articles:
            print "title:", article['title']
        section = "This Sample Section"
        feeds.append((section, articles))
        return feeds

10-07-2014, 07:52 AM	#4
dkfurrow Member Posts: 13 Karma: 10 Join Date: Jun 2013 Device: LG G-Pad 8.3	Thanks for the reply. Yeah I got similar results for my run...you're right, they must be doing some changes. The only issue I had was the opinion section still not downloading. I know you put in a fix for that a few weeks ago, and the articles in question contain the tag with "article-contents" id, but for some reason it's not working. I tried a few other combinations of tags for the "keep-only" list, but still couldn't get it to work (except by removing keep_only, which puts too much extra stuff in). A sample parse_index function with two articles (one that works, one that doesn't) is below. Do you get similar results? Code: def parse_index(self): feeds = [] articles = [] # will parse title1 = 'HP Article WSJ' desc1 = 'about Hewlett Packard' url1 = 'http://online.wsj.com/articles/hewlett-packard-split-comes-as-more-investors-say-big-isnt-better-1412643100' articles.append({'title':title1, 'url':url1, 'description':desc1, 'date':''}) # won't parse title = "Stephens Article in WSJ" desc = 'china bubble story' url = 'http://online.wsj.com/articles/bret-stephens-hong-kong-pops-the-china-bubble-1412636585' articles.append({'title':title, 'url':url, 'description':desc, 'date':''}) for article in articles: print "title:", article['title'] section = "This Sample Section" feeds.append((section, articles)) return feeds