Quote:
Originally Posted by m.tarenskeen
Hi,
Recently De Volkskrant (NL) changed their website. I think that is why I cannot read news downloaded with calibre offline on my e-reader anymore.
The titles and headlines are fetched, but instead of the corresponding articles I can just see an url pointing to the article that I want to see. My simple e-reader does not have WiFi.
I am using calibre 0.7.23.
Maybe one of the "recipe gurus" here can take a look at volkskrant.recipe and see if it can be fixed ?
MT
|
the very last two rss feeds appear to be broken on the site itself. They are feedburner links (cough cough) but anyway i commented them out. If valid rss links are discovered later let me know and I will gladly update it.
Here is a working version - Modified to utilize obfuscation to get the print version of the articles and removed the keep_only_tags
Spoiler:
Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement
__license__ = 'GPL v3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'
'''
Modified by Tony Stegall
on 10/10/10 to include function to grab print version of articles
'''
from calibre.web.feeds.news import BasicNewsRecipe
'''
added by Tony Stegall
'''
#######################################################
from calibre.ptempfile import PersistentTemporaryFile
#######################################################
class AdvancedUserRecipe1249039563(BasicNewsRecipe):
title = u'De Volkskrant'
__author__ = 'acidzebra'
oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
language = 'nl'
extra_css = '''
body{font-family:Arial,Helvetica,sans-serif; font-size:small;}
h1{font-size:large;}
'''
'''
Change Log:
Date: 10/10/10 - Modified code to include obfuscated to get the print version
Author: Tony Stegall
'''
#######################################################################################################
temp_files = []
articles_are_obfuscated = True
def get_obfuscated_article(self, url):
br = self.get_browser()
print 'THE CURRENT URL IS: ', url
br.open(url)
try:
response = br.follow_link(url_regex='.*?(2010)(\\/)(article)(\\/)(print)(\\/)', nr = 0)
html = response.read()
except:
response = br.open(url)
html = response.read()
self.temp_files.append(PersistentTemporaryFile('_fa.html'))
self.temp_files[-1].write(html)
self.temp_files[-1].close()
return self.temp_files[-1].name
###############################################################################################################
feeds = [
(u'Laatste Nieuws', u'http://volkskrant.nl/rss/laatstenieuws.rss'),
(u'Binnenlands nieuws', u'http://volkskrant.nl/rss/nederland.rss'),
(u'Buitenlands nieuws', u'http://volkskrant.nl/rss/internationaal.rss'),
(u'Economisch nieuws', u'http://volkskrant.nl/rss/economie.rss'),
(u'Sportnieuws', u'http://volkskrant.nl/rss/sport.rss'),
(u'Kunstnieuws', u'http://volkskrant.nl/rss/kunst.rss'),
'''
both of these rss feeds link back to the main volksrant.nl url a.k.a Broken
If someone happens to know the correct paths then they can put them in here
'''
#(u'Wetenschapsnieuws', u'http://feeds.feedburner.com/DeVolkskrantWetenschap'),
#(u'Technologienieuws', u'http://feeds.feedburner.com/vkmedia')
]
'''
example for formating
'''
# original url: http://www.volkskrant.nl/vk/nl/2668/Buitenland/article/detail/1031493/2010/10/10/Noord-Korea-ziet-nieuwe-leider.dhtml
# print url : http://www.volkskrant.nl/vk/nl/2668/2010/article/print/detail/1031493/Noord-Korea-ziet-nieuwe-leider.dhtml