Quote:
Originally Posted by awitko
I reran the recipe to get a copy again. The articles show a short portion of the article and then an image that states subscriber only content...Alex
|
I contributed an earlier incarnation of the recipe that had a variable omit_paid_content which, if set to True, would skip the paid content articles. However, the recipe has been rewritten since then and that customization was removed (reason unknown).
You can create a custom version of the WSJ (free) recipe that omits the paid articles by adding the following code to the standard recipe
Code:
def preprocess_html(self,soup):
article_title = self.tag_to_string(soup.title)
# check if article is paid content
divtag = soup.find('div','adSummary subscribePromo recipeNotABCShopAndBuy')
if divtag:
self.log("\nPaid article omitted (%s)" % article_title)
return None
return soup