Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-16-2011, 01:41 AM   #1
Aeon
Junior Member
Aeon began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2011
Device: Kindle 4
The Economist - past issues

I am trying to modify the current recipe for The Economist so that I can download past issues of the magazine.

This code works perfectly well, but requires the user to modify the recipe each time one wants a different issue:

Spoiler:
#!/usr/bin/env python

__license__ = 'GPL v3'
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
'''
economist.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString
from collections import OrderedDict

import time, re

class Economist(BasicNewsRecipe):

title = 'The Economist - past issues'
INDEX = 'http://www.economist.com/printedition/2011-09-10'
language = 'en'
__author__ = "Kovid Goyal"
description = ('Global news and current affairs from a European'
' perspective. Best downloaded on Friday mornings (GMT)')
extra_css = '.headline {font-size: x-large;} \n h2 { font-size: small; } \n h1 { font-size: medium; }'
oldest_article = 7.0
remove_tags = [
dict(name=['script', 'noscript', 'title', 'iframe', 'cf_floatingcontent']),
dict(attrs={'class':['dblClkTrk', 'ec-article-info',
'share_inline_header', 'related-items']}),
{'class': lambda x: x and 'share-links-header' in x},
]
keep_only_tags = [dict(id='ec-article-body')]
needs_subscription = False
no_stylesheets = True
preprocess_regexps = [(re.compile('</html>.*', re.DOTALL),
lambda x:'</html>')]

# economist.com has started throttling after about 60% of the total has
# downloaded with connection reset by peer (104) errors.
delay = 1

def get_cover_url(self):
br = self.browser
br.open(self.INDEX)
self.log('Fetching cover for issue: ')
cover_url = "http://media.economist.com/sites/default/files/imagecache/print-cover-full/print-covers/20110910_CNA400.jpg"
return cover_url


def parse_index(self):
try:
return self.economist_parse_index()
except:
raise
self.log.warn(
'Initial attempt to parse index failed, retrying in 30 seconds')
time.sleep(30)
return self.economist_parse_index()

def economist_parse_index(self):
soup = self.index_to_soup(self.INDEX)
div = soup.find('div', attrs={'class':'issue-image'})
if div is not None:
img = div.find('img', src=True)
if img is not None:
self.cover_url = img['src']
feeds = OrderedDict()
for section in soup.findAll(attrs={'class':lambda x: x and 'section' in
x}):
h4 = section.find('h4')
if h4 is None:
continue
section_title = self.tag_to_string(h4).strip()
if not section_title:
continue
self.log('Found section: %s'%section_title)
articles = []
subsection = ''
for node in section.findAll(attrs={'class':'article'}):
subsec = node.findPreviousSibling('h5')
if subsec is not None:
subsection = self.tag_to_string(subsec)
prefix = (subsection+': ') if subsection else ''
a = node.find('a', href=True)
if a is not None:
url = a['href']
if url.startswith('/'): url = 'http://www.economist.com'+url
url += '/print'
title = self.tag_to_string(a)
if title:
title = prefix + title
self.log('\tFound article:', title)
articles.append({'title':title, 'url':url,
'description':'', 'date':''})

if articles:
if section_title not in feeds:
feeds[section_title] = []
feeds[section_title] += articles

ans = [(key, val) for key, val in feeds.iteritems()]
if not ans:
raise Exception('Could not find any articles, either the '
'economist.com server is having trouble and you should '
'try later or the website format has changed and the '
'recipe needs to be updated.')
return ans

def eco_find_image_tables(self, soup):
for x in soup.findAll('table', align=['right', 'center']):
if len(x.findAll('font')) in (1,2) and len(x.findAll('img')) == 1:
yield x

def postprocess_html(self, soup, first):
body = soup.find('body')
for name, val in body.attrs:
del body[name]

for table in list(self.eco_find_image_tables(soup)):
caption = table.find('font')
img = table.find('img')
div = Tag(soup, 'div')
div['style'] = 'text-align:left;font-size:70%'
ns = NavigableString(self.tag_to_string(caption))
div.insert(0, ns)
div.insert(1, Tag(soup, 'br'))
del img['width']
del img['height']
img.extract()
div.insert(2, img)
table.replaceWith(div)
return soup


Now I would like to use the input from the username field to modify INDEX. Something like:
Code:
issue = self.username
INDEX = 'http://www.economist.com/printedition/%s' % (issue.translate(None,'-'))
This doesn't work. However I write the code, it doesn't work. I'm a bit stuck.

Any hint on how I can modify INDEX with data from self.username? Would that mean modifying __init__ files?
Aeon is offline   Reply With Quote
Old 10-16-2011, 02:35 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,907
Karma: 22666668
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
In python self refers to an object instantiated from a class, you cannot use it in class level variables. Instead of using it with INDEX, you should replace the use of INDEX in the rest of the recipe.
kovidgoyal is offline   Reply With Quote
Advert
Old 10-16-2011, 01:20 PM   #3
Aeon
Junior Member
Aeon began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2011
Device: Kindle 4
I have tried and it doesn't work. The variable that is called has to be defined in the class Economist. I can't define it later, or else I get an error

Quote:
Python function terminated unexpectedly: 'Economist' object has no attribute 'variable'
I also can't simply create an empty INDEX variable in the class and then fill it with whatever I want. The fetching doesn't work if I do that.

Global vars don't work either. The "username" variable used in self.username is not available unless inside functions... So I can't use the user input while defining class variables...
Aeon is offline   Reply With Quote
Old 10-17-2011, 08:43 AM   #4
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 519
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
You can get rid of the class variable INDEX and just use an instance variable. Like this:
Code:
    def get_browser(self):
        self.INDEX = 'http://www.economist.com/printedition/'+self.username
        return BasicNewsRecipe.get_browser()

    def get_cover_url(self):
        br = self.browser
        br.open(self.INDEX)
        self.log('Fetching cover for issue: ')
        date = self.username.replace('-', '')
        cover_url =  "http://media.economist.com/sites/default/files/imagecache/print-cover-full/%s_CNA400.jpg" % date
        return cover_url
pietvo is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Downloading older Economist issues partymonkey Recipes 10 11-28-2011 10:09 AM
recipe for past Economist issues davide125 Recipes 1 10-09-2011 06:20 PM
Portrait of the Past khalleron Self-Promotions by Authors and Publishers 15 12-08-2010 10:41 AM
Fetching Past NYTimes issues strico Calibre 0 09-29-2009 06:32 PM
A blast from the past cbarnett Introduce Yourself 7 08-29-2007 11:11 AM


All times are GMT -4. The time now is 03:13 PM.


MobileRead.com is a privately owned, operated and funded community.