Thanks for the reply. Yeah I got similar results for my run...you're right, they must be doing some changes.
The only issue I had was the opinion section still not downloading. I know you put in a fix for that a few weeks ago, and the articles in question contain the tag with "article-contents" id, but for some reason it's not working. I tried a few other combinations of tags for the "keep-only" list, but still couldn't get it to work (except by removing keep_only, which puts too much extra stuff in). A sample parse_index function with two articles (one that works, one that doesn't) is below. Do you get similar results?
Code:
def parse_index(self):
feeds = []
articles = []
# will parse
title1 = 'HP Article WSJ'
desc1 = 'about Hewlett Packard'
url1 = 'http://online.wsj.com/articles/hewlett-packard-split-comes-as-more-investors-say-big-isnt-better-1412643100'
articles.append({'title':title1, 'url':url1, 'description':desc1, 'date':''})
# won't parse
title = "Stephens Article in WSJ"
desc = 'china bubble story'
url = 'http://online.wsj.com/articles/bret-stephens-hong-kong-pops-the-china-bubble-1412636585'
articles.append({'title':title, 'url':url, 'description':desc, 'date':''})
for article in articles:
print "title:", article['title']
section = "This Sample Section"
feeds.append((section, articles))
return feeds