View Single Post
Old 05-26-2013, 08:50 AM   #7
Camper65
Enthusiast
Camper65 began at the beginning.
 
Posts: 32
Karma: 10
Join Date: Apr 2011
Device: Kindle wifi; Dell 2in1
New issue and I'm not sure it's just a problem this week or not. I had to add recursion = 1 to force it to download the article. The feed site now has an ad page that apparently comes up first and then you have to "Click here to continue to article". How can I have it automatically avoid that first page or in other words, get the right link from that ad page?

(here's the modified recipe to try)

Spoiler:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, Tag, BeautifulStoneSoup


class dotnetMagazine (BasicNewsRecipe):
__author__ = u'Bonni Salles - post in forum if questions for me'
__version__ = '1.0'
__license__ = 'GPL v3'
__copyright__ = u'2013, Bonni Salles'
title = '.net magazine'
oldest_article = 7
no_stylesheets = True
recursions = 1
encoding = 'utf8'
use_embedded_content = False
language = 'en'
remove_empty_feeds = True
extra_css = ' body{font-family: Arial,Helvetica,sans-serif } img{margin-bottom: 0.4em} '
cover_url = u'http://media.netmagazine.futurecdn.net/sites/all/themes/netmag/logo.png'

remove_tags_after = dict(name='footer', id=lambda x:not x)
remove_tags_before = dict(name='header', id=lambda x:not x)


remove_tags = [
dict(name='div', attrs={'class': 'item-list'}),
dict(name='h4', attrs={'class': 'std-hdr'}),
dict(name='div', attrs={'class': 'item-list share-links'}),
dict(name=['script', 'noscript']),
dict(name='div', attrs={'id': 'comments-form'}),
dict(name='div', attrs={'id': re.compile('advertorial_block_($|| )')}),
dict(name='div', attrs={'id': 'right-col'}),
dict(name='div', attrs={'id': 'comments'}),
dict(name='div', attrs={'class': 'item-list related-content'}),

]

feeds = [
(u'net', u'http://feeds.feedburner.com/net/topstories')
]
Camper65 is offline   Reply With Quote