10-10-2010, 03:41 PM | #1 |
Member
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
|
volkskrant.recipe broken
Hi,
Recently De Volkskrant (NL) changed their website. I think that is why I cannot read news downloaded with calibre offline on my e-reader anymore. The titles and headlines are fetched, but instead of the corresponding articles I can just see an url pointing to the article that I want to see. My simple e-reader does not have WiFi. I am using calibre 0.7.23. Maybe one of the "recipe gurus" here can take a look at volkskrant.recipe and see if it can be fixed ? MT |
10-10-2010, 06:40 PM | #2 | |
Addict
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
Here is a working version - Modified to utilize obfuscation to get the print version of the articles and removed the keep_only_tags Spoiler:
|
|
Advert | |
|
10-11-2010, 04:23 AM | #3 |
Member
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
|
Thanks!
|
10-12-2010, 05:58 AM | #4 |
Member
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
|
[QUOTE=TonytheBookworm;1156268]
try: response = br.follow_link(url_regex='.*?(2010)(\\/)(article)(\\/)(print)(\\/)', nr = 0) html = response.read() except: response = br.open(url) html = response.read() Looks like this will only work in 2010 and will be outdated after 3 months already ? What about something like try: for yy in range(2010,2020): response = br.follow_link(url_regex='.*?(%d)(\\/)(article(\\/)(print)(\\/)', nr = 0) % yy (BTW: How do I keep my indentation intact in this online message editor ? If I "preview Post" it disappears. Not exactly what I want when posting Python code.) |
10-12-2010, 07:54 AM | #5 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
Advert | |
|
10-12-2010, 06:00 PM | #6 | |
Addict
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
|
Quote:
|
|
10-14-2010, 07:29 PM | #7 | |
Member
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
|
Updated feed links in volkskrant.recipe
Quote:
But I did find and update/replace the broken links for "Technologie nieuws" en "Wetenschap". Replaced them with "Media" and "Gezondheid & Wetenschap". My updated version of volkskrant.recipe attached. BTW: I still think the "only-2010 bug" should be fixed before it can be included in the calibre distribution. BTW2: The volkskrant recipe that comes with the current version of Calibre not only is broken, but also the filename is wrong: "volkskant.recipe" should be "volkskrant.recipe" |
|
10-14-2010, 07:40 PM | #8 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I fixed the year problem in the calibre version of the recipe. You can see it in the calibre source code, or wait for next release.
|
12-31-2010, 06:27 AM | #9 | |
Member
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
|
Quote:
I have thought of a fix for this issue. I will test it as soon as 2011 has begun, and will report what I find and what I did to fix the problem. Stay tuned! |
|
01-01-2011, 11:18 AM | #10 | |
Member
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
|
update: volkskrant recipe 2010-->2011
Quote:
Code:
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai from __future__ import with_statement __license__ = 'GPL v3' __copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>' __docformat__ = 'restructuredtext en' ''' Modified by Tony Stegall on 10/10/10 to include function to grab print version of articles ''' from datetime import date from calibre.web.feeds.news import BasicNewsRecipe ''' added by Tony Stegall ''' ####################################################### from calibre.ptempfile import PersistentTemporaryFile ####################################################### class AdvancedUserRecipe1249039563(BasicNewsRecipe): title = u'De Volkskrant' __author__ = 'acidzebra' oldest_article = 7 max_articles_per_feed = 100 no_stylesheets = True language = 'nl' extra_css = ''' body{font-family:Arial,Helvetica,sans-serif;font-size:small;} h1{font-size:large;} ''' ''' Change Log: Date: 10/10/10 - Modified code to include obfuscated to get the print version Author: Tony Stegall Date: 01/01/11 - Modified for better results around December/January. Author: Martin Tarenskeen ''' ####################################################################################################### temp_files = [] articles_are_obfuscated = True def get_obfuscated_article(self, url): br = self.get_browser() print 'THE CURRENT URL IS: ', url br.open(url) year = date.today().year try: response = br.follow_link(url_regex='.*?(%d)(\\/)(article)(\\/)(print)(\\/)'%year, nr = 0) html = response.read() except: year = year-1 try: response = br.follow_link(url_regex='.*?(%d)(\\/)(article)(\\/)(print)(\\/)'%year, nr = 0) html = response.read() except: response = br.open(url) html = response.read() self.temp_files.append(PersistentTemporaryFile('_fa.html')) self.temp_files[-1].write(html) self.temp_files[-1].close() return self.temp_files[-1].name ############################################################################################################### ''' Change Log: Date: 10/15/2010 Feeds updated by Martin Tarenskeen ''' feeds = [ (u'Laatste Nieuws', u'http://www.volkskrant.nl/rss/laatstenieuws.rss'), (u'Binnenland', u'http://www.volkskrant.nl/rss/nederland.rss'), (u'Buitenland', u'http://www.volkskrant.nl/rss/internationaal.rss'), (u'Economie', u'http://www.volkskrant.nl/rss/economie.rss'), (u'Sport', u'http://www.volkskrant.nl/rss/sport.rss'), (u'Cultuur', u'http://www.volkskrant.nl/rss/kunst.rss'), (u'Gezondheid & Wetenschap', u'http://www.volkskrant.nl/rss/wetenschap.rss'), (u'Internet & Media', u'http://www.volkskrant.nl/rss/media.rss') ] |
|
Tags |
recipe, volkskrant |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
New York Times recipe broken? | gianfri | Calibre | 1 | 03-20-2010 09:52 AM |
Recipe for The Week broken? | gianfri | Calibre | 3 | 03-19-2010 08:05 PM |
Recipe Volkskrant paid version | prodsaaw | Calibre | 0 | 02-18-2010 04:00 PM |
Engadget Recipe Broken | pars_andy | Calibre | 1 | 12-01-2009 10:39 PM |
Economist Recipe - broken? | dieterpops | Calibre | 1 | 02-20-2009 09:14 PM |