View Single Post
Old 10-10-2010, 06:40 PM   #2
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by m.tarenskeen View Post
Hi,

Recently De Volkskrant (NL) changed their website. I think that is why I cannot read news downloaded with calibre offline on my e-reader anymore.

The titles and headlines are fetched, but instead of the corresponding articles I can just see an url pointing to the article that I want to see. My simple e-reader does not have WiFi.

I am using calibre 0.7.23.
Maybe one of the "recipe gurus" here can take a look at volkskrant.recipe and see if it can be fixed ?

MT
the very last two rss feeds appear to be broken on the site itself. They are feedburner links (cough cough) but anyway i commented them out. If valid rss links are discovered later let me know and I will gladly update it.

Here is a working version - Modified to utilize obfuscation to get the print version of the articles and removed the keep_only_tags
Spoiler:

Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement

__license__   = 'GPL v3'
__copyright__ = '2009, Kovid Goyal <kovid@kovidgoyal.net>'
__docformat__ = 'restructuredtext en'

'''
 Modified by Tony Stegall
 on 10/10/10 to include function to grab print version of articles
'''

from calibre.web.feeds.news import BasicNewsRecipe
'''
added by Tony Stegall
'''
#######################################################
from calibre.ptempfile import PersistentTemporaryFile
#######################################################

class AdvancedUserRecipe1249039563(BasicNewsRecipe):
    title          = u'De Volkskrant'
    __author__     = 'acidzebra'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True
    language = 'nl'

    
    extra_css      = '''
                        body{font-family:Arial,Helvetica,sans-serif; font-size:small;}
                        h1{font-size:large;}
                     '''
    '''
      Change Log:
        Date:       10/10/10  - Modified code to include obfuscated to get the print version
        Author:   Tony Stegall
    '''
   ####################################################################################################### 
    temp_files = []
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        print 'THE CURRENT URL IS: ', url
        br.open(url)
        
        try:
         response = br.follow_link(url_regex='.*?(2010)(\\/)(article)(\\/)(print)(\\/)', nr = 0)
         html = response.read()
        except:
         response = br.open(url)
         html = response.read()
         
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name

   ###############################################################################################################
   
    feeds          = [
                      (u'Laatste Nieuws', u'http://volkskrant.nl/rss/laatstenieuws.rss'),
                      (u'Binnenlands nieuws', u'http://volkskrant.nl/rss/nederland.rss'), 
                      (u'Buitenlands nieuws', u'http://volkskrant.nl/rss/internationaal.rss'), 
                      (u'Economisch nieuws', u'http://volkskrant.nl/rss/economie.rss'), 
                      (u'Sportnieuws', u'http://volkskrant.nl/rss/sport.rss'), 
                      (u'Kunstnieuws', u'http://volkskrant.nl/rss/kunst.rss'), 
                      '''
                        both of these rss feeds link back to the main volksrant.nl url a.k.a Broken
                        If someone happens to know the correct paths then they can put them in here
                      '''
                      #(u'Wetenschapsnieuws', u'http://feeds.feedburner.com/DeVolkskrantWetenschap'), 
                      #(u'Technologienieuws', u'http://feeds.feedburner.com/vkmedia')
                      ]

''' 
example for formating
'''
# original url: http://www.volkskrant.nl/vk/nl/2668/Buitenland/article/detail/1031493/2010/10/10/Noord-Korea-ziet-nieuwe-leider.dhtml 
# print url :   http://www.volkskrant.nl/vk/nl/2668/2010/article/print/detail/1031493/Noord-Korea-ziet-nieuwe-leider.dhtml
TonytheBookworm is offline   Reply With Quote