View Single Post
Old 02-02-2010, 01:55 PM   #1329
JaClar
Junior Member
JaClar began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Nov 2009
Device: PRS 600
TAZ Recipe

Quote:
Originally Posted by Baumi View Post
My recipe wish is a little bit unusual in that I don't requite: The German newspaper taz already provides an epub edition (without DRM) for subscribers. When you go to http://www.taz.de/epub and enter valid credentials in the htaccess-form, the most current epub is automatically downloaded. To read it on my Cybook Gen3 with 1.5 firmware, I then use calibre to convert it, tag it as "News" and upload it to the reader.

...

Thanks for any infos.
I made a recipe for TAZ following the suggestions of kovidgoyal to rewrite the build_index() method:

Code:
#!/usr/bin/env  python
# -*- coding: utf-8 -*-

__license__   = 'GPL v3'
__copyright__ = '2010, Lars Jacob jacob.lars at gmail.com'
__docformat__ = 'restructuredtext de'

'''
www.taz.de/digiabo
'''
import os, re, urllib2, zipfile, tempfile
from calibre.web.feeds.news import BasicNewsRecipe

class TazDigiabo(BasicNewsRecipe):
	
	title = u'Taz Digiabo'
	description = u'Das EPUB DigiAbo der Taz'
	language = 'de'
	lang = 'de-DE'
	
	__author__ = 'Lars Jacob' 
	needs_subscription = True
	
	conversion_options = {
		'no_default_epub_cover' : True
	}
	
	def build_index(self):
		if self.username is not None and self.password is not None:
			domain = "http://www.taz.de"
			
			url = domain + "/digitaz/.digiabo"
			
			index = urllib2.urlopen(url)
			
			reg = "<a href=\"([^\"]*)\">taz_[0-9]{4}_[0-9]{2}_[0-9]{2}\.epub</a>"
			
			find = re.search(reg,index.read())
			
			issue = domain + find.group(1)
			
			auth_handler = urllib2.HTTPBasicAuthHandler()
			auth_handler.add_password(realm='TAZ-ABO',
									  uri=issue,
									  user=self.username,
									  passwd=self.password)
			opener = urllib2.build_opener(auth_handler)
			urllib2.install_opener(opener)
			
			try:
				f = urllib2.urlopen(issue)
			except urllib2.HTTPError as e:
				self.report_progress(0,_('Can\'t login to download %s.')%issue)
				return
			
			tmp = tempfile.TemporaryFile()
			self.report_progress(0,_('downloading epub'))
			tmp.write(f.read())
			
			zfile = zipfile.ZipFile(tmp, 'r')
			self.report_progress(0,_('extracting epub'))
			
			zfile.extractall(self.output_dir)
			
			tmp.close()
			index = os.path.join(self.output_dir, 'content.opf')
			
			self.report_progress(1,_('epub downloaded and extracted'))
			
			return index


I have to say that i'm a little bit underwhelmed with the result. Calibre reformats the whole book, which actually works quite well, but destroys the header quite a bit... Would be much nicer if Calibre supports the direct download of epub and other ebook files.

cheers,
jaclar
JaClar is offline