Downloading older Economist issues

partymonkey · 03-15-2011, 06:17 PM

I have a subscription to The Economist, which gives me access to historical print issues.

I wanted to create a customized recipe based on the built-in recipe that would allow me to pass a specific date format to the INDEX in order to generate the correct url to retrieve the older issues.

I know the URL I need to set INDEX to, but what I'm struggling with is whether there is a way to get calibre to prompt me for the specific date I'm interested in. I looked at the NY Times recipe because it has a login section, but I'm still trying to figure out how to accomplish this.

Can anyone help? I'm still learning how to customize recipes.

Thanks in advance.

kovidgoyal · 03-15-2011, 06:57 PM

Just overload the username field pass in the date/issue number in the username and have the login code strip it out.

partymonkey · 03-17-2011, 01:32 PM

OK, thanks for the tip.

I'm a bit over my head on this one, I'm still learning calibre and never programmed in python (but I understand objects, etc).

I cannot get self.username to be recognized. So could you or someone feed me a bit more guidance?

In the The Economist recipe, there is an attribute/variable that holds the index url (INDEX). I need to concatenate the contents of self.username (which I would put a date in the format of 20110305 for example) to INDEX. So I need to end up with INDEX + self.username, but I'm struggling where to accomplish that in the recipe.

Thanks for any help you can provide.

(BTW, I do have the attribute of needs_subscription set to True, and I do get the prompt on the calibre interface).

kovidgoyal · 03-17-2011, 01:40 PM

Look at economist_parse_index

partymonkey · 03-21-2011, 08:29 PM

Thanks for your help.

I've finally been able to get this recipe to accomplish what I wanted. My apologies if I'm violating python/calibre good practices, but I'd figure I'd share this in case someone else has a digital or print subscription to The Economist and wants to download older issues.
In the username field, enter a date in the format YYYYMMDD, and in the password field just enter a single character. The date must correspond to the issue date on The Economist issue. (Also, I believe you need to be logged into your Economist account on a browser first, so that a cookie gets created and remembers you, but I'm not 100% sure this is the way it works).

In any case, QUESTION: is there a way to set the Publish Date on the recipe to the actual issue date?

Anyhow, here it is in case someone wants it.

Cheers.
==================

Code:

#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
'''
economist.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

import string, time, re

class Economist(BasicNewsRecipe):

    title = 'The Economist (old issues)' 
    language = 'en'

    __author__ = "Kovid Goyal"
    INDEX = 'http://www.economist.com/printedition/index.cfm?d='
    description = ' - Global news and current affairs from a European perspective.'

    oldest_article = 7.0
    economist_cover_url = 'http://www.economist.com/images/images-magazine/'
    cover_url = None
    cover_suffix = '_CNA400.jpg'
    remove_tags = [
            dict(name=['script', 'noscript', 'title', 'iframe', 'cf_floatingcontent']),
            dict(attrs={'class':['dblClkTrk', 'ec-article-info', 'share_inline_header']}),
            {'class': lambda x: x and 'share-links-header' in x},
    ]
    keep_only_tags = [dict(id='ec-article-body')]
    needs_subscription = True
    no_stylesheets = True
    preprocess_regexps = [(re.compile('</html>.*', re.DOTALL),
        lambda x:'</html>')]

    '''
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        br.open('http://www.economist.com')
        req = mechanize.Request(
                'http://www.economist.com/members/members.cfm?act=exec_login',
                headers = {
                    'Referer':'http://www.economist.com/',
                    },
                data=urllib.urlencode({
                    'logging_in' : 'Y',
                    'returnURL'  : '/',
                    'email_address': self.username,
                    'fakepword' : 'Password',
                    'pword'     : self.password,
                    'x'         : '0',
                    'y'         : '0',
                    }))
        br.open(req).read()
        return br
    '''

    def get_cover_url(self):
        issuedate = self.username
        self.log.info('Issuedate is: ' + issuedate)
        iyr = issuedate[0:4]
        imo = issuedate[4:-2]
        idt = issuedate[6:]
        self.cover_url = self.economist_cover_url + iyr + '/' + imo + '/' + idt + '/CN/' + issuedate + self.cover_suffix
        self.log.info('Cover url is: ' + self.cover_url)
        return self.cover_url

    def get_masthead_title(self):
        issuedate = self.username
        iyr = issuedate[0:4]
        imo = issuedate[4:-2]
        idt = issuedate[6:]
        self.title = 'The Economist (' + iyr + '/' + imo + '/' + idt + ')'
        return self.title

    def parse_index(self):
        try:
            return self.economist_parse_index()
        except:
            raise
            self.log.warn(
                'Initial attempt to parse index failed, retrying in 30 seconds')
            time.sleep(30)
            return self.economist_parse_index()

    def economist_parse_index(self):
        self.INDEX = self.INDEX + self.username
        soup = BeautifulSoup(self.browser.open(self.INDEX).read(),
                             convertEntities=BeautifulSoup.HTML_ENTITIES)
        index_started = False
        feeds = {}
        ans = []
        key = None
        for tag in soup.findAll(['h1', 'h2']):
            text = ''.join(tag.findAll(text=True))
            if tag.name in ('h1', 'h2') and 'Classified ads' in text:
                break
            if tag.name == 'h1':
                if 'The world this week' in text or 'The world this year' in text:
                    index_started = True
                if not index_started:
                    continue
                text = string.capwords(text)
                if text not in feeds.keys():
                    feeds[text] = []
                if text not in ans:
                    ans.append(text)
                key = text
                continue
            if key is None:
                continue
            a = tag.find('a', href=True)
            if a is not None:
                url=a['href']
                id_ = re.search(r'story_id=(\d+)', url).group(1)
                url = 'http://www.economist.com/node/%s/print'%id_
                if url.startswith('Printer'):
                    url = '/'+url
                if url.startswith('/'):
                    url = 'http://www.economist.com' + url
                try:
                   subtitle = tag.previousSibling.contents[0].contents[0]
                   text = subtitle + ': ' + text
                except:
                   pass
                article = dict(title=text,
                    url = url,
                    description='', content='', date='')
                feeds[key].append(article)

        ans = [(key, feeds[key]) for key in ans if feeds.has_key(key)]
        if not ans:
            raise Exception('Could not find any articles. Has your subscription expired?')
        return ans

    def eco_find_image_tables(self, soup):
        for x in soup.findAll('table', align=['right', 'center']):
            if len(x.findAll('font')) in (1,2) and len(x.findAll('img')) == 1:
                yield x

    def postprocess_html(self, soup, first):
        body = soup.find('body')
        for name, val in body.attrs:
            del body[name]

        for table in list(self.eco_find_image_tables(soup)):
            caption = table.find('font')
            img = table.find('img')
            div = Tag(soup, 'div')
            div['style'] = 'text-align:left;font-size:70%'
            ns = NavigableString(self.tag_to_string(caption))
            div.insert(0, ns)
            div.insert(1, Tag(soup, 'br'))
            del img['width']
            del img['height']
            img.extract()
            div.insert(2, img)
            table.replaceWith(div)
        return soup

mandyahl · 08-30-2011, 05:17 AM

i am also trying to download older economist issues using calibre and am getting the error message asking if my subscription has expired (it hasn't). downloading the most recent issue works perfectly, but can't figure out how to do older issues. any help would be very much appreciated--wonderful tool you've created!

Aeon · 10-15-2011, 05:22 PM

How do I go about modifying recipes with the OSX version of calibre?

EDIT: After a good 15 minutes of looking everywhere to no avail, I tried a simple "right-click" on the Fetch News button.

My bad.

Aeon · 10-15-2011, 06:45 PM

The script provided by partymonkey does not work for me.

It would be very nice to have a standard recipe for fetching older issues. I'm positive it would be a quick fix to add this to the current recipe.

EDIT:

Quote:

needs_subscription = True

def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username and self.password:
br.open('http://www.economist.com/user/login')
br.select_form(nr=1)
br['name'] = self.username
br['pass'] = self.password
res = br.submit()
raw = res.read()
if '>Log out<' not in raw:
raise ValueError('Failed to login to economist.com. '
'Check your username and password.')
return br

def get_cover_url(self):
br = self.browser
br.open(self.INDEX)
issue = '2011-09-10'
self.log('Fetching cover for issue: %s'%issue)
cover_url = "http://media.economist.com/sites/default/files/imagecache/print-cover-full/print-covers/%s_CNA400.jpg" %(issue.translate(None,'-'))
return cover_url

This does not work either.

pietvo · 10-17-2011, 09:23 AM

See https://www.mobileread.com/forums/sho...24&postcount=4

awitko · 11-04-2011, 11:21 AM

The economist recipe does not seem to work for me - is the standard recipe working for others? I notice there are two recipes in builtin. They are different as confirmed by a diff. I used the one with a username/password block but it does not work. I'd like to download the full current issues. If the capability for past issues can be included that would be great! Please do share if you have a working economist recipe.

Thanks in advance!

Alex

ang002 · 11-28-2011, 10:09 AM

PREFACE: I am completely new at this and have no experience with any kind of computer languages.

SITUATION: I downloaded calibre and inserted the recipe (attached as a *.TXT file through originally a *.RECIPE file) in an attempt to download previous issues of The Economist with my paid subscription for the print edition. Despite my persistence and limited best efforts, I continue to receive the following ERROR message (below). Please note that I've tried downloading content from other news sources to verify that calibre is working on my PC and it is. I've even had perfect success with the current issue of The Economist. It is when attempting to download previous issues where I am hitting a snag. Any help would be much appreciated as I am a new owner of a Kindle and would love to read my only magazine subscription on it! Thank you!

calibre, version 0.8.28
ERROR: Conversion Error: <b>Failed</b>: Fetch news from The Economist (old issues)

Fetch news from The Economist (old issues)
Resolved conversion options
calibre version: 0.8.28
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'change_justification': 'original',
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_compress': False,
'dont_download_recipe': False,
'duplicate_links_in_toc': False,
'enable_heuristics': False,
'extra_css': None,
'extract_to': None,
'filter_css': None,
'fix_indents': True,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x05743890>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'mobi_ignore_margins': False,
'mobi_toc_at_start': False,
'no_chapters_in_toc': False,
'no_inline_navbars': True,
'no_inline_toc': False,
'output_profile': <calibre.customize.profiles.KindleOutput object at 0x05743BB0>,
'page_breaks_before': None,
'password': 'LSG_2011tsi',
'personal_doc': '[PDOC]',
'prefer_author_sort': False,
'prefer_metadata_cover': False,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': '',
'rescale_images': False,
'series': None,
'series_index': None,
'share_not_sync': False,
'smarten_punctuation': False,
'sr1_replace': '',
'sr1_search': '',
'sr2_replace': '',
'sr2_search': '',
'sr3_replace': '',
'sr3_search': '',
'tags': None,
'test': False,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'username': 'antonio.octaviani@gmail.com',
'verbose': 2}
InputFormatPlugin: Recipe Input running
Python function terminated unexpectedly
Could not find any articles. Has your subscription expired? (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 959, in run
File "site-packages\calibre\customize\conversion.py", line 204, in __call__
File "site-packages\calibre\web\feeds\input.py", line 105, in convert
File "site-packages\calibre\web\feeds\news.py", line 824, in download
File "site-packages\calibre\web\feeds\news.py", line 968, in build_index
File "c:\users\t&a\appdata\local\temp\calibre_0.8.28_tm p_qjzabk\0gzeze_recipes\recipe0.py", line 63, in parse_index
return self.economist_parse_index()
File "c:\users\t&a\appdata\local\temp\calibre_0.8.28_tm p_qjzabk\0gzeze_recipes\recipe0.py", line 118, in economist_parse_index
raise Exception('Could not find any articles. Has your subscription expired?')
Exception: Could not find any articles. Has your subscription expired?

03-15-2011, 06:17 PM	#1
partymonkey Junior Member Posts: 9 Karma: 10 Join Date: Feb 2011 Device: Kindle	Downloading older Economist issues I have a subscription to The Economist, which gives me access to historical print issues. I wanted to create a customized recipe based on the built-in recipe that would allow me to pass a specific date format to the INDEX in order to generate the correct url to retrieve the older issues. I know the URL I need to set INDEX to, but what I'm struggling with is whether there is a way to get calibre to prompt me for the specific date I'm interested in. I looked at the NY Times recipe because it has a login section, but I'm still trying to figure out how to accomplish this. Can anyone help? I'm still learning how to customize recipes. Thanks in advance.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Older matrix available somewhere?	JP36	Feedback	5	01-12-2011 06:29 PM
Downloading v.1.4 firmware from new Cybook to older one	maninlek	Bookeen	6	12-27-2009 12:12 PM
Downloading previous issues of Newsweek	kbfprivate	Calibre	6	05-07-2009 11:58 PM
Older version 0.5.6?	thibaulthalpern	Calibre	4	04-18-2009 07:18 PM

03-15-2011, 06:57 PM	#2
kovidgoyal creator of calibre Posts: 43,850 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Just overload the username field pass in the date/issue number in the username and have the login code strip it out.

03-17-2011, 01:32 PM	#3
partymonkey Junior Member Posts: 9 Karma: 10 Join Date: Feb 2011 Device: Kindle	OK, thanks for the tip. I'm a bit over my head on this one, I'm still learning calibre and never programmed in python (but I understand objects, etc). I cannot get self.username to be recognized. So could you or someone feed me a bit more guidance? In the The Economist recipe, there is an attribute/variable that holds the index url (INDEX). I need to concatenate the contents of self.username (which I would put a date in the format of 20110305 for example) to INDEX. So I need to end up with INDEX + self.username, but I'm struggling where to accomplish that in the recipe. Thanks for any help you can provide. (BTW, I do have the attribute of needs_subscription set to True, and I do get the prompt on the calibre interface).

03-17-2011, 01:40 PM	#4
kovidgoyal creator of calibre Posts: 43,850 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Look at economist_parse_index

08-30-2011, 05:17 AM	#6
mandyahl Junior Member Posts: 1 Karma: 10 Join Date: Aug 2011 Device: Kindle	i am also trying to download older economist issues using calibre and am getting the error message asking if my subscription has expired (it hasn't). downloading the most recent issue works perfectly, but can't figure out how to do older issues. any help would be very much appreciated--wonderful tool you've created!

10-15-2011, 05:22 PM	#7
Aeon Junior Member Posts: 4 Karma: 10 Join Date: Oct 2011 Device: Kindle 4	How do I go about modifying recipes with the OSX version of calibre? EDIT: After a good 15 minutes of looking everywhere to no avail, I tried a simple "right-click" on the Fetch News button. My bad.

10-17-2011, 09:23 AM	#9
pietvo Reader Posts: 519 Karma: 24612 Join Date: Aug 2009 Location: Utrecht, NL Device: Kobo Aura 2, iPhone, iPad	See https://www.mobileread.com/forums/sho...24&postcount=4

11-04-2011, 11:21 AM	#10
awitko Member Posts: 23 Karma: 10 Join Date: Oct 2011 Device: Kindle	The economist recipe does not seem to work for me - is the standard recipe working for others? I notice there are two recipes in builtin. They are different as confirmed by a diff. I used the one with a username/password block but it does not work. I'd like to download the full current issues. If the capability for past issues can be included that would be great! Please do share if you have a working economist recipe. Thanks in advance! Alex

Advert

Advert