Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-10-2010, 03:41 PM   #1
m.tarenskeen
Member
m.tarenskeen began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
volkskrant.recipe broken

Hi,

Recently De Volkskrant (NL) changed their website. I think that is why I cannot read news downloaded with calibre offline on my e-reader anymore.

The titles and headlines are fetched, but instead of the corresponding articles I can just see an url pointing to the article that I want to see. My simple e-reader does not have WiFi.

I am using calibre 0.7.23.
Maybe one of the "recipe gurus" here can take a look at volkskrant.recipe and see if it can be fixed ?

MT
m.tarenskeen is offline   Reply With Quote
Old 10-10-2010, 06:40 PM   #2
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by m.tarenskeen View Post
Hi,

Recently De Volkskrant (NL) changed their website. I think that is why I cannot read news downloaded with calibre offline on my e-reader anymore.

The titles and headlines are fetched, but instead of the corresponding articles I can just see an url pointing to the article that I want to see. My simple e-reader does not have WiFi.

I am using calibre 0.7.23.
Maybe one of the "recipe gurus" here can take a look at volkskrant.recipe and see if it can be fixed ?

MT
the very last two rss feeds appear to be broken on the site itself. They are feedburner links (cough cough) but anyway i commented them out. If valid rss links are discovered later let me know and I will gladly update it.

Here is a working version - Modified to utilize obfuscation to get the print version of the articles and removed the keep_only_tags
Spoiler:

Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement

__license__   = 'GPL v3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'

'''
 Modified by Tony Stegall
 on 10/10/10 to include function to grab print version of articles
'''

from calibre.web.feeds.news import BasicNewsRecipe
'''
added by Tony Stegall
'''
#######################################################
from calibre.ptempfile import PersistentTemporaryFile
#######################################################

class AdvancedUserRecipe1249039563(BasicNewsRecipe):
    title          = u'De Volkskrant'
    __author__     = 'acidzebra'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True
    language = 'nl'

    
    extra_css      = '''
                        body{font-family:Arial,Helvetica,sans-serif; font-size:small;}
                        h1{font-size:large;}
                     '''
    '''
      Change Log:
        Date:       10/10/10  - Modified code to include obfuscated to get the print version
        Author:   Tony Stegall
    '''
   ####################################################################################################### 
    temp_files = []
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        print 'THE CURRENT URL IS: ', url
        br.open(url)
        
        try:
         response = br.follow_link(url_regex='.*?(2010)(\\/)(article)(\\/)(print)(\\/)', nr = 0)
         html = response.read()
        except:
         response = br.open(url)
         html = response.read()
         
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name

   ###############################################################################################################
   
    feeds          = [
                      (u'Laatste Nieuws', u'http://volkskrant.nl/rss/laatstenieuws.rss'),
                      (u'Binnenlands nieuws', u'http://volkskrant.nl/rss/nederland.rss'), 
                      (u'Buitenlands nieuws', u'http://volkskrant.nl/rss/internationaal.rss'), 
                      (u'Economisch nieuws', u'http://volkskrant.nl/rss/economie.rss'), 
                      (u'Sportnieuws', u'http://volkskrant.nl/rss/sport.rss'), 
                      (u'Kunstnieuws', u'http://volkskrant.nl/rss/kunst.rss'), 
                      '''
                        both of these rss feeds link back to the main volksrant.nl url a.k.a Broken
                        If someone happens to know the correct paths then they can put them in here
                      '''
                      #(u'Wetenschapsnieuws', u'http://feeds.feedburner.com/DeVolkskrantWetenschap'), 
                      #(u'Technologienieuws', u'http://feeds.feedburner.com/vkmedia')
                      ]

''' 
example for formating
'''
# original url: http://www.volkskrant.nl/vk/nl/2668/Buitenland/article/detail/1031493/2010/10/10/Noord-Korea-ziet-nieuwe-leider.dhtml 
# print url :   http://www.volkskrant.nl/vk/nl/2668/2010/article/print/detail/1031493/Noord-Korea-ziet-nieuwe-leider.dhtml
TonytheBookworm is offline   Reply With Quote
Advert
Old 10-11-2010, 04:23 AM   #3
m.tarenskeen
Member
m.tarenskeen began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
Thanks!
m.tarenskeen is offline   Reply With Quote
Old 10-12-2010, 05:58 AM   #4
m.tarenskeen
Member
m.tarenskeen began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
[QUOTE=TonytheBookworm;1156268]
try:
response = br.follow_link(url_regex='.*?(2010)(\\/)(article)(\\/)(print)(\\/)', nr = 0)
html = response.read()
except:
response = br.open(url)
html = response.read()

Looks like this will only work in 2010 and will be outdated after 3 months already ?
What about something like

try:
for yy in range(2010,2020):
response = br.follow_link(url_regex='.*?(%d)(\\/)(article(\\/)(print)(\\/)', nr = 0) % yy

(BTW: How do I keep my indentation intact in this online message editor ? If I "preview Post" it disappears. Not exactly what I want when posting Python code.)
m.tarenskeen is offline   Reply With Quote
Old 10-12-2010, 07:54 AM   #5
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by m.tarenskeen View Post
(BTW: How do I keep my indentation intact in this online message editor ? If I "preview Post" it disappears. Not exactly what I want when posting Python code.)
Highlight code and press the code button (hash/pound/number sign)
Starson17 is offline   Reply With Quote
Advert
Old 10-12-2010, 06:00 PM   #6
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:

Looks like this will only work in 2010 and will be outdated after 3 months already ?
What about something like

try:
for yy in range(2010,2020):
response = br.follow_link(url_regex='.*?(%d)(\\/)(article(\\/)(print)(\\/)', nr = 0) % yy

(BTW: How do I keep my indentation intact in this online message editor ? If I "preview Post" it disappears. Not exactly what I want when posting Python code.)
yeah i thought about that when writing the code; however i figured who is to say it will be in that format in 2011 so i just kept it the way it was. I figure when 2011 rolls around they will change the site anyway and force us to rewrite the recipe anyway but good thinking on the date range thing.
TonytheBookworm is offline   Reply With Quote
Old 10-14-2010, 07:29 PM   #7
m.tarenskeen
Member
m.tarenskeen began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
Updated feed links in volkskrant.recipe

Quote:
Originally Posted by TonytheBookworm View Post
yeah i thought about that when writing the code; however i figured who is to say it will be in that format in 2011 so i just kept it the way it was. I figure when 2011 rolls around they will change the site anyway and force us to rewrite the recipe anyway but good thinking on the date range thing.
...but I have to do some more thinking to make the trick really work. My first experiments did not give the wished result.

But I did find and update/replace the broken links for "Technologie nieuws" en "Wetenschap". Replaced them with "Media" and "Gezondheid & Wetenschap".

My updated version of volkskrant.recipe attached.

BTW: I still think the "only-2010 bug" should be fixed before it can be included in the calibre distribution.

BTW2: The volkskrant recipe that comes with the current version of Calibre not only is broken, but also the filename is wrong: "volkskant.recipe" should be "volkskrant.recipe"
Attached Files
File Type: txt volkskrant.recipe.txt (3.0 KB, 299 views)
m.tarenskeen is offline   Reply With Quote
Old 10-14-2010, 07:40 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
I fixed the year problem in the calibre version of the recipe. You can see it in the calibre source code, or wait for next release.
kovidgoyal is online now   Reply With Quote
Old 12-31-2010, 06:27 AM   #9
m.tarenskeen
Member
m.tarenskeen began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
Quote:
Originally Posted by kovidgoyal View Post
I fixed the year problem in the calibre version of the recipe. You can see it in the calibre source code, or wait for next release.
I have a feeling the recipe will not work correctly in the beginning of 2011: It will then only fetch articles from 2011, but it probably still has to be able to fetch older articles from december 2010 also. I think the current recipe will fail on that point.

I have thought of a fix for this issue. I will test it as soon as 2011 has begun, and will report what I find and what I did to fix the problem. Stay tuned!
m.tarenskeen is offline   Reply With Quote
Old 01-01-2011, 11:18 AM   #10
m.tarenskeen
Member
m.tarenskeen began at the beginning.
 
Posts: 14
Karma: 10
Join Date: Oct 2010
Device: BeBook One
update: volkskrant recipe 2010-->2011

Quote:
Originally Posted by m.tarenskeen View Post
I have a feeling the recipe will not work correctly in the beginning of 2011: It will then only fetch articles from 2011, but it probably still has to be able to fetch older articles from december 2010 also. I think the current recipe will fail on that point.

I have thought of a fix for this issue. I will test it as soon as 2011 has begun, and will report what I find and what I did to fix the problem. Stay tuned!
I have done some testing and my modified version gives better results than the currently distributed recipe. This is my version:

Code:
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement

__license__   = 'GPL v3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'

'''
 Modified by Tony Stegall
 on 10/10/10 to include function to grab print version of articles
'''

from datetime import date
from calibre.web.feeds.news import BasicNewsRecipe
'''
added by Tony Stegall
'''
#######################################################
from calibre.ptempfile import PersistentTemporaryFile
#######################################################

class AdvancedUserRecipe1249039563(BasicNewsRecipe):
    title          = u'De Volkskrant'
    __author__     = 'acidzebra'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True
    language = 'nl'

    extra_css      = '''
                        body{font-family:Arial,Helvetica,sans-serif;font-size:small;}
                        h1{font-size:large;}
                     '''
    '''
      Change Log:
        Date:       10/10/10  - Modified code to include obfuscated to get the print version
        Author:   Tony Stegall
        
        Date:       01/01/11  - Modified for better results around December/January.
        Author:   Martin Tarenskeen
    '''
   #######################################################################################################
    temp_files = []
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        print 'THE CURRENT URL IS: ', url
        br.open(url)
        year = date.today().year

        try:
            response = br.follow_link(url_regex='.*?(%d)(\\/)(article)(\\/)(print)(\\/)'%year, nr = 0)
            html = response.read()
        except:
            year = year-1
            try:
                response = br.follow_link(url_regex='.*?(%d)(\\/)(article)(\\/)(print)(\\/)'%year, nr = 0)
                html = response.read()
            except:
                response = br.open(url)
                html = response.read()


        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name

   ###############################################################################################################

    '''
      Change Log:
       Date: 10/15/2010
       Feeds updated by Martin Tarenskeen
    '''

    feeds          = [
                      (u'Laatste Nieuws', u'http://www.volkskrant.nl/rss/laatstenieuws.rss'),
                      (u'Binnenland', u'http://www.volkskrant.nl/rss/nederland.rss'),
                      (u'Buitenland', u'http://www.volkskrant.nl/rss/internationaal.rss'),
                      (u'Economie', u'http://www.volkskrant.nl/rss/economie.rss'),
                      (u'Sport', u'http://www.volkskrant.nl/rss/sport.rss'),
                      (u'Cultuur', u'http://www.volkskrant.nl/rss/kunst.rss'),
                      (u'Gezondheid & Wetenschap', u'http://www.volkskrant.nl/rss/wetenschap.rss'),
                      (u'Internet & Media', u'http://www.volkskrant.nl/rss/media.rss') ]
m.tarenskeen is offline   Reply With Quote
Reply

Tags
recipe, volkskrant


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
New York Times recipe broken? gianfri Calibre 1 03-20-2010 09:52 AM
Recipe for The Week broken? gianfri Calibre 3 03-19-2010 08:05 PM
Recipe Volkskrant paid version prodsaaw Calibre 0 02-18-2010 04:00 PM
Engadget Recipe Broken pars_andy Calibre 1 12-01-2009 10:39 PM
Economist Recipe - broken? dieterpops Calibre 1 02-20-2009 09:14 PM


All times are GMT -4. The time now is 07:47 AM.


MobileRead.com is a privately owned, operated and funded community.