Thread: NRC Handelsblad
View Single Post
Old 12-22-2010, 11:32 AM   #1
veezh
plus ça change
veezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beautyveezh does all things with Zen-like beauty
 
veezh's Avatar
 
Posts: 101
Karma: 32134
Join Date: Dec 2009
Location: France
Device: Kindle PW2, Voyage
NRC Handelsblad

This recipe will download the full epub version of NRC Handelsblad which is (temporarily?) available free without a login.

Thanks to Lars Jacob for his Taz Digiabo recipe, on which this is heavily based, and to Starson17, who posted a reference to it in the reusable code sticky.

Code:
#!/usr/bin/env  python
# -*- coding: utf-8 -*-
#Based on Lars Jacob's Taz Digiabo recipe

__license__   = 'GPL v3'
__copyright__ = '2010, veezh'

'''
www.nrc.nl
'''
import os, urllib2, zipfile
import datetime, time
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile


class NRCHandelsblad(BasicNewsRecipe):

    title = u'NRC Handelsblad'
    description = u'De EPUB-versie van NRC'
    language = 'nl'
    lang = 'nl-NL'

    __author__ = 'veezh'

    conversion_options = {
        'no_default_epub_cover' : True
    }

    def build_index(self):
        day = datetime.date.today()
        today = time.strftime("%Y%m%d")
        domain = "http://digitaleeditie.nrc.nl"

        url = domain + "/digitaleeditie/helekrant/epub/nrc_" + today + ".epub"
#        print url

        try:
            f = urllib2.urlopen(url)
        except urllib2.HTTPError:
            self.report_progress(0,_('Kan niet inloggen om editie te downloaden'))
            raise ValueError('Krant van vandaag nog niet beschikbaar')

        tmp = PersistentTemporaryFile(suffix='.epub')
        self.report_progress(0,_('downloading epub'))
        tmp.write(f.read())
        tmp.close()

        zfile = zipfile.ZipFile(tmp.name, 'r')
        self.report_progress(0,_('extracting epub'))

        zfile.extractall(self.output_dir)

        tmp.close()
        index = os.path.join(self.output_dir, 'content.opf')

        self.report_progress(1,_('epub downloaded and extracted'))

        return index
veezh is offline   Reply With Quote