MobileRead Forums - View Single Post - Beneath Ceaseless Skies recipe for direct epub downloading

duckpuppy · 02-23-2011, 03:10 PM

BCS is a free monthly digital publication for fantasy stories. I'm trying to massage the Now Toronto recipe into a BCS recipe. Beneath Ceaseless Skies has a feed for new issue announcements, and they provide PDF, epub, and mobi files for each issue. The problem is that the feed doesn't have a link to the published files, nor does it have a link to the issue page (the only link in each feed item is to the discussion forum thread for that issue).

I have a semi-working recipe (at the bottom of this post). It can grab the most recent issue from the feed by parsing the issue number and constructing a direct download link to the epub, downloading it, and then unzipping it and returning the content.opf file as the index.

I have the following problems:

The epub has a cover image, but it doesn't get used in the resulting re-zipped epub. The cover is blank.

There are usually multiple authors in each issue. The author metadata is just fine in the resulting epub, but the author sort is always the last,first of the last author listed in the author metadata. I can manually edit the metadata and click the button to automatically set the author sort from the author, and it's fine, but I'd like to make sure it's set during the conversion.

It will probably become apparent from the code below that I'm not a Python expert, though I am a software developer by day, so I've been able to muddle through getting this to work a little. Anybody out there able to help me with the problems I've listed?

As an added bonus, I'd love to set the series to "Beneath Ceaseless Skies" and set the series number to the issue number in the epub metadata... is that possible?

Code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-
#Based on Starson17's NowToronto recipe, which in turn was based on Lars Jacob's Taz Digiabo recipe

__license__ = 'GPL v3'
__copyright__ = '2011, DuckPuppy'

import os, urllib2, zipfile
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile

class BeneathCeaselessSkies(BasicNewsRecipe):
	title = u'Beneath Ceaseless Skies'
	description = u'Beneath Ceaseless Skies'
	__author__ = 'DuckPuppy'

	def build_index(self):
		epub_feed = "http://www.beneath-ceaseless-skies.com/forums/external.php?type=rss2&forumids=2"
		soup = self.index_to_soup(epub_feed)
		item = soup.find(name='item')
		title = item.find(name='title').string
		print 'Title: ' + title
		issueloc = title.rfind("#")
		issue = title[issueloc+1:].encode('utf-8')
		print issue
		url = u'http://www.beneath-ceaseless-skies.com/ebooks/BeneathCeaselessSkies_Issue{0:03}.epub'.format(int(issue))
		print url
		f = urllib2.urlopen(url)
		tmp = PersistentTemporaryFile(suffix='.epub')
		self.report_progress(0,_('downloading epub'))
		tmp.write(f.read())
		tmp.close()
		zfile = zipfile.ZipFile(tmp.name, 'r')
		self.report_progress(0,_('extracting epub'))
		zfile.extractall(self.output_dir)
		tmp.close()
		index = os.path.join(self.output_dir, 'content.opf')
		self.report_progress(1,_('epub downloaded and extracted'))
		return index

02-23-2011, 03:10 PM	#1
duckpuppy Junior Member Posts: 8 Karma: 10 Join Date: Feb 2011 Device: Android, various	Beneath Ceaseless Skies recipe for direct epub downloading BCS is a free monthly digital publication for fantasy stories. I'm trying to massage the Now Toronto recipe into a BCS recipe. Beneath Ceaseless Skies has a feed for new issue announcements, and they provide PDF, epub, and mobi files for each issue. The problem is that the feed doesn't have a link to the published files, nor does it have a link to the issue page (the only link in each feed item is to the discussion forum thread for that issue). I have a semi-working recipe (at the bottom of this post). It can grab the most recent issue from the feed by parsing the issue number and constructing a direct download link to the epub, downloading it, and then unzipping it and returning the content.opf file as the index. I have the following problems: The epub has a cover image, but it doesn't get used in the resulting re-zipped epub. The cover is blank. There are usually multiple authors in each issue. The author metadata is just fine in the resulting epub, but the author sort is always the last,first of the last author listed in the author metadata. I can manually edit the metadata and click the button to automatically set the author sort from the author, and it's fine, but I'd like to make sure it's set during the conversion. It will probably become apparent from the code below that I'm not a Python expert, though I am a software developer by day, so I've been able to muddle through getting this to work a little. Anybody out there able to help me with the problems I've listed? As an added bonus, I'd love to set the series to "Beneath Ceaseless Skies" and set the series number to the issue number in the epub metadata... is that possible? Code: #!/usr/bin/env python # -- coding: utf-8 -- #Based on Starson17's NowToronto recipe, which in turn was based on Lars Jacob's Taz Digiabo recipe __license__ = 'GPL v3' __copyright__ = '2011, DuckPuppy' import os, urllib2, zipfile from calibre.web.feeds.news import BasicNewsRecipe from calibre.ptempfile import PersistentTemporaryFile class BeneathCeaselessSkies(BasicNewsRecipe): title = u'Beneath Ceaseless Skies' description = u'Beneath Ceaseless Skies' __author__ = 'DuckPuppy' def build_index(self): epub_feed = "http://www.beneath-ceaseless-skies.com/forums/external.php?type=rss2&forumids=2" soup = self.index_to_soup(epub_feed) item = soup.find(name='item') title = item.find(name='title').string print 'Title: ' + title issueloc = title.rfind("#") issue = title[issueloc+1:].encode('utf-8') print issue url = u'http://www.beneath-ceaseless-skies.com/ebooks/BeneathCeaselessSkies_Issue{0:03}.epub'.format(int(issue)) print url f = urllib2.urlopen(url) tmp = PersistentTemporaryFile(suffix='.epub') self.report_progress(0,_('downloading epub')) tmp.write(f.read()) tmp.close() zfile = zipfile.ZipFile(tmp.name, 'r') self.report_progress(0,_('extracting epub')) zfile.extractall(self.output_dir) tmp.close() index = os.path.join(self.output_dir, 'content.opf') self.report_progress(1,_('epub downloaded and extracted')) return index Last edited by duckpuppy; 02-23-2011 at 03:13 PM.