I'm trying to rewrite the default .net recipe, it seems the way the feeds url is handled changed and it no longer works right anymore.
I've been playing with it, I'm comfortable with HTML and CSS, some php and java but didn't really study python but am getting to understand more of it with doing all this.
This one gets the title of the article and sometimes the descriptions of the articles from the newsfeed but doesn't go any further to pass the actual URL of to the article so that it can pull the whole article.
Spoiler:
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from calibre.web.feeds.news import BasicNewsRecipe
import re
class NetMagazineRecipe (BasicNewsRecipe):
__author__ = u'Marc Busqué <marc@lamarciana.com>'
__url__ = 'http://www.lamarciana.com'
__version__ = '1.0'
__license__ = 'GPL v3'
__copyright__ = u'2012, Marc Busqué <marc@lamarciana.com>'
title = u'.net magazine Custom'
description = u'net is the world’s best-selling magazine for web designers and developers, featuring tutorials from leading agencies, interviews with the web’s biggest names, and agenda-setting features on the hottest issues affecting the internet today.'
language = 'en'
tags = 'web development, software'
oldest_article = 7
remove_empty_feeds = True
no_stylesheets = True
auto_cleanup = True
cover_url = u'http://media.netmagazine.futurecdn.net/sites/all/themes/netmag/logo.png'
# remove_tags_above = dict(id='header')
# remove_tags_below = [dict(name='footer')]
# keep_only_tags = [
# dict(name='article', attrs={'class': re.compile('^node.*$', re.IGNORECASE)}),
# ]
# remove_tags = [
# dict(name='span', attrs={'class': 'comment-count'}),
# dict(name='div', attrs={'class': 'item-list share-links'}),
# dict(name='footer'),
# ]
# remove_attributes = ['border', 'cellspacing', 'align', 'cellpadding', 'colspan', 'valign', 'vspace', 'hspace', #'alt', 'width', 'height', 'style']
# extra_css = 'img {max-width: 100%; display: block; margin: auto;} .captioned-image div {text-align: center; #font-style: italic;}'
feeds = [
(u'.net', u'http://feeds.feedburner.com/net/topstories?format=xml'),
]
(I have commented out the tag area until I can get it working then can modify it to what is needed and not needed).
In trying to get it to pass the url of the feedburner entry I'm trying the following:
Spoiler:
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from calibre.web.feeds.news import BasicNewsRecipe
import re
class NetMagazineRecipe (BasicNewsRecipe):
__author__ = u'Marc Busqué <marc@lamarciana.com>'
__url__ = 'http://www.lamarciana.com'
__version__ = '1.0'
__license__ = 'GPL v3'
__copyright__ = u'2012, Marc Busqué <marc@lamarciana.com>'
title = u'.net magazine Custom'
description = u'net is the world’s best-selling magazine for web designers and developers, featuring tutorials from leading agencies, interviews with the web’s biggest names, and agenda-setting features on the hottest issues affecting the internet today.'
language = 'en'
tags = 'web development, software'
oldest_article = 7
remove_empty_feeds = True
no_stylesheets = True
auto_cleanup = True
cover_url = u'http://media.netmagazine.futurecdn.net/sites/all/themes/netmag/logo.png'
# remove_tags_above = dict(id='header')
# remove_tags_below = [dict(name='footer')]
# keep_only_tags = [
# dict(name='article', attrs={'class': re.compile('^node.*$', re.IGNORECASE)}),
# ]
# remove_tags = [
# dict(name='span', attrs={'class': 'comment-count'}),
# dict(name='div', attrs={'class': 'item-list share-links'}),
# dict(name='footer'),
# ]
# remove_attributes = ['border', 'cellspacing', 'align', 'cellpadding', 'colspan', 'valign', 'vspace', 'hspace', #'alt', 'width', 'height', 'style']
# extra_css = 'img {max-width: 100%; display: block; margin: auto;} .captioned-image div {text-align: center; #font-style: italic;}'
feeds = [
(u'.net', u'http://feeds.feedburner.com/net/topstories?format=xml'),
]
def get_article_url(self, article):
url = article.get('link', None)
return url
Can anyone help me adjust how to pass the url so that the recipe can convert the feed to an actual URL so that it can download the articles. Unfortunately there are no print versions of these articles so the original must be used. Thanks.