Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-15-2011, 06:17 PM   #1
partymonkey
Junior Member
partymonkey began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: Kindle
Downloading older Economist issues

I have a subscription to The Economist, which gives me access to historical print issues.

I wanted to create a customized recipe based on the built-in recipe that would allow me to pass a specific date format to the INDEX in order to generate the correct url to retrieve the older issues.

I know the URL I need to set INDEX to, but what I'm struggling with is whether there is a way to get calibre to prompt me for the specific date I'm interested in. I looked at the NY Times recipe because it has a login section, but I'm still trying to figure out how to accomplish this.

Can anyone help? I'm still learning how to customize recipes.

Thanks in advance.
partymonkey is offline   Reply With Quote
Old 03-15-2011, 06:57 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Just overload the username field pass in the date/issue number in the username and have the login code strip it out.
kovidgoyal is online now   Reply With Quote
Advert
Old 03-17-2011, 01:32 PM   #3
partymonkey
Junior Member
partymonkey began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: Kindle
OK, thanks for the tip.

I'm a bit over my head on this one, I'm still learning calibre and never programmed in python (but I understand objects, etc).

I cannot get self.username to be recognized. So could you or someone feed me a bit more guidance?

In the The Economist recipe, there is an attribute/variable that holds the index url (INDEX). I need to concatenate the contents of self.username (which I would put a date in the format of 20110305 for example) to INDEX. So I need to end up with INDEX + self.username, but I'm struggling where to accomplish that in the recipe.

Thanks for any help you can provide.

(BTW, I do have the attribute of needs_subscription set to True, and I do get the prompt on the calibre interface).
partymonkey is offline   Reply With Quote
Old 03-17-2011, 01:40 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Look at economist_parse_index
kovidgoyal is online now   Reply With Quote
Old 03-21-2011, 08:29 PM   #5
partymonkey
Junior Member
partymonkey began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: Kindle
Thanks for your help.

I've finally been able to get this recipe to accomplish what I wanted. My apologies if I'm violating python/calibre good practices, but I'd figure I'd share this in case someone else has a digital or print subscription to The Economist and wants to download older issues.
In the username field, enter a date in the format YYYYMMDD, and in the password field just enter a single character. The date must correspond to the issue date on The Economist issue. (Also, I believe you need to be logged into your Economist account on a browser first, so that a cookie gets created and remembers you, but I'm not 100% sure this is the way it works).

In any case, QUESTION: is there a way to set the Publish Date on the recipe to the actual issue date?

Anyhow, here it is in case someone wants it.

Cheers.
==================

Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
__copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>'
'''
economist.com
'''
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

import string, time, re

class Economist(BasicNewsRecipe):

    title = 'The Economist (old issues)' 
    language = 'en'

    __author__ = "Kovid Goyal"
    INDEX = 'http://www.economist.com/printedition/index.cfm?d='
    description = ' - Global news and current affairs from a European perspective.'

    oldest_article = 7.0
    economist_cover_url = 'http://www.economist.com/images/images-magazine/'
    cover_url = None
    cover_suffix = '_CNA400.jpg'
    remove_tags = [
            dict(name=['script', 'noscript', 'title', 'iframe', 'cf_floatingcontent']),
            dict(attrs={'class':['dblClkTrk', 'ec-article-info', 'share_inline_header']}),
            {'class': lambda x: x and 'share-links-header' in x},
    ]
    keep_only_tags = [dict(id='ec-article-body')]
    needs_subscription = True
    no_stylesheets = True
    preprocess_regexps = [(re.compile('</html>.*', re.DOTALL),
        lambda x:'</html>')]

    '''
    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        br.open('http://www.economist.com')
        req = mechanize.Request(
                'http://www.economist.com/members/members.cfm?act=exec_login',
                headers = {
                    'Referer':'http://www.economist.com/',
                    },
                data=urllib.urlencode({
                    'logging_in' : 'Y',
                    'returnURL'  : '/',
                    'email_address': self.username,
                    'fakepword' : 'Password',
                    'pword'     : self.password,
                    'x'         : '0',
                    'y'         : '0',
                    }))
        br.open(req).read()
        return br
    '''

    def get_cover_url(self):
        issuedate = self.username
        self.log.info('Issuedate is: ' + issuedate)
        iyr = issuedate[0:4]
        imo = issuedate[4:-2]
        idt = issuedate[6:]
        self.cover_url = self.economist_cover_url + iyr + '/' + imo + '/' + idt + '/CN/' + issuedate + self.cover_suffix
        self.log.info('Cover url is: ' + self.cover_url)
        return self.cover_url

    def get_masthead_title(self):
        issuedate = self.username
        iyr = issuedate[0:4]
        imo = issuedate[4:-2]
        idt = issuedate[6:]
        self.title = 'The Economist (' + iyr + '/' + imo + '/' + idt + ')'
        return self.title

    def parse_index(self):
        try:
            return self.economist_parse_index()
        except:
            raise
            self.log.warn(
                'Initial attempt to parse index failed, retrying in 30 seconds')
            time.sleep(30)
            return self.economist_parse_index()

    def economist_parse_index(self):
        self.INDEX = self.INDEX + self.username
        soup = BeautifulSoup(self.browser.open(self.INDEX).read(),
                             convertEntities=BeautifulSoup.HTML_ENTITIES)
        index_started = False
        feeds = {}
        ans = []
        key = None
        for tag in soup.findAll(['h1', 'h2']):
            text = ''.join(tag.findAll(text=True))
            if tag.name in ('h1', 'h2') and 'Classified ads' in text:
                break
            if tag.name == 'h1':
                if 'The world this week' in text or 'The world this year' in text:
                    index_started = True
                if not index_started:
                    continue
                text = string.capwords(text)
                if text not in feeds.keys():
                    feeds[text] = []
                if text not in ans:
                    ans.append(text)
                key = text
                continue
            if key is None:
                continue
            a = tag.find('a', href=True)
            if a is not None:
                url=a['href']
                id_ = re.search(r'story_id=(\d+)', url).group(1)
                url = 'http://www.economist.com/node/%s/print'%id_
                if url.startswith('Printer'):
                    url = '/'+url
                if url.startswith('/'):
                    url = 'http://www.economist.com' + url
                try:
                   subtitle = tag.previousSibling.contents[0].contents[0]
                   text = subtitle + ': ' + text
                except:
                   pass
                article = dict(title=text,
                    url = url,
                    description='', content='', date='')
                feeds[key].append(article)

        ans = [(key, feeds[key]) for key in ans if feeds.has_key(key)]
        if not ans:
            raise Exception('Could not find any articles. Has your subscription expired?')
        return ans

    def eco_find_image_tables(self, soup):
        for x in soup.findAll('table', align=['right', 'center']):
            if len(x.findAll('font')) in (1,2) and len(x.findAll('img')) == 1:
                yield x

    def postprocess_html(self, soup, first):
        body = soup.find('body')
        for name, val in body.attrs:
            del body[name]

        for table in list(self.eco_find_image_tables(soup)):
            caption = table.find('font')
            img = table.find('img')
            div = Tag(soup, 'div')
            div['style'] = 'text-align:left;font-size:70%'
            ns = NavigableString(self.tag_to_string(caption))
            div.insert(0, ns)
            div.insert(1, Tag(soup, 'br'))
            del img['width']
            del img['height']
            img.extract()
            div.insert(2, img)
            table.replaceWith(div)
        return soup

Last edited by kovidgoyal; 03-21-2011 at 08:49 PM.
partymonkey is offline   Reply With Quote
Advert
Old 08-30-2011, 05:17 AM   #6
mandyahl
Junior Member
mandyahl began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Aug 2011
Device: Kindle
i am also trying to download older economist issues using calibre and am getting the error message asking if my subscription has expired (it hasn't). downloading the most recent issue works perfectly, but can't figure out how to do older issues. any help would be very much appreciated--wonderful tool you've created!
mandyahl is offline   Reply With Quote
Old 10-15-2011, 05:22 PM   #7
Aeon
Junior Member
Aeon began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2011
Device: Kindle 4
How do I go about modifying recipes with the OSX version of calibre?

EDIT: After a good 15 minutes of looking everywhere to no avail, I tried a simple "right-click" on the Fetch News button.

My bad.
Aeon is offline   Reply With Quote
Old 10-15-2011, 06:45 PM   #8
Aeon
Junior Member
Aeon began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Oct 2011
Device: Kindle 4
The script provided by partymonkey does not work for me.

It would be very nice to have a standard recipe for fetching older issues. I'm positive it would be a quick fix to add this to the current recipe.

EDIT:
Quote:
needs_subscription = True

def get_browser(self):
br = BasicNewsRecipe.get_browser()
if self.username and self.password:
br.open('http://www.economist.com/user/login')
br.select_form(nr=1)
br['name'] = self.username
br['pass'] = self.password
res = br.submit()
raw = res.read()
if '>Log out<' not in raw:
raise ValueError('Failed to login to economist.com. '
'Check your username and password.')
return br


def get_cover_url(self):
br = self.browser
br.open(self.INDEX)
issue = '2011-09-10'
self.log('Fetching cover for issue: %s'%issue)
cover_url = "http://media.economist.com/sites/default/files/imagecache/print-cover-full/print-covers/%s_CNA400.jpg" %(issue.translate(None,'-'))
return cover_url
This does not work either.

Last edited by Aeon; 10-15-2011 at 07:18 PM.
Aeon is offline   Reply With Quote
Old 10-17-2011, 09:23 AM   #9
pietvo
Reader
pietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notespietvo can name that song in three notes
 
pietvo's Avatar
 
Posts: 519
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
See https://www.mobileread.com/forums/sho...24&postcount=4
pietvo is offline   Reply With Quote
Old 11-04-2011, 11:21 AM   #10
awitko
Member
awitko began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Oct 2011
Device: Kindle
The economist recipe does not seem to work for me - is the standard recipe working for others? I notice there are two recipes in builtin. They are different as confirmed by a diff. I used the one with a username/password block but it does not work. I'd like to download the full current issues. If the capability for past issues can be included that would be great! Please do share if you have a working economist recipe.

Thanks in advance!

Alex
awitko is offline   Reply With Quote
Old 11-28-2011, 10:09 AM   #11
ang002
Junior Member
ang002 began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Nov 2011
Device: Kindle Touch
Also having trouble with previous issues of The Economist

PREFACE: I am completely new at this and have no experience with any kind of computer languages.

SITUATION: I downloaded calibre and inserted the recipe (attached as a *.TXT file through originally a *.RECIPE file) in an attempt to download previous issues of The Economist with my paid subscription for the print edition. Despite my persistence and limited best efforts, I continue to receive the following ERROR message (below). Please note that I've tried downloading content from other news sources to verify that calibre is working on my PC and it is. I've even had perfect success with the current issue of The Economist. It is when attempting to download previous issues where I am hitting a snag. Any help would be much appreciated as I am a new owner of a Kindle and would love to read my only magazine subscription on it! Thank you!


calibre, version 0.8.28
ERROR: Conversion Error: <b>Failed</b>: Fetch news from The Economist (old issues)

Fetch news from The Economist (old issues)
Resolved conversion options
calibre version: 0.8.28
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0,
'book_producer': None,
'change_justification': 'original',
'chapter': None,
'chapter_mark': 'pagebreak',
'comments': None,
'cover': None,
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_compress': False,
'dont_download_recipe': False,
'duplicate_links_in_toc': False,
'enable_heuristics': False,
'extra_css': None,
'extract_to': None,
'filter_css': None,
'fix_indents': True,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x05743890>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0,
'linearize_tables': False,
'lrf': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'mobi_ignore_margins': False,
'mobi_toc_at_start': False,
'no_chapters_in_toc': False,
'no_inline_navbars': True,
'no_inline_toc': False,
'output_profile': <calibre.customize.profiles.KindleOutput object at 0x05743BB0>,
'page_breaks_before': None,
'password': 'LSG_2011tsi',
'personal_doc': '[PDOC]',
'prefer_author_sort': False,
'prefer_metadata_cover': False,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': None,
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': '',
'rescale_images': False,
'series': None,
'series_index': None,
'share_not_sync': False,
'smarten_punctuation': False,
'sr1_replace': '',
'sr1_search': '',
'sr2_replace': '',
'sr2_search': '',
'sr3_replace': '',
'sr3_search': '',
'tags': None,
'test': False,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'unsmarten_punctuation': False,
'unwrap_lines': True,
'use_auto_toc': False,
'username': 'antonio.octaviani@gmail.com',
'verbose': 2}
InputFormatPlugin: Recipe Input running
Python function terminated unexpectedly
Could not find any articles. Has your subscription expired? (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 132, in main
File "site.py", line 109, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 187, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 959, in run
File "site-packages\calibre\customize\conversion.py", line 204, in __call__
File "site-packages\calibre\web\feeds\input.py", line 105, in convert
File "site-packages\calibre\web\feeds\news.py", line 824, in download
File "site-packages\calibre\web\feeds\news.py", line 968, in build_index
File "c:\users\t&a\appdata\local\temp\calibre_0.8.28_tm p_qjzabk\0gzeze_recipes\recipe0.py", line 63, in parse_index
return self.economist_parse_index()
File "c:\users\t&a\appdata\local\temp\calibre_0.8.28_tm p_qjzabk\0gzeze_recipes\recipe0.py", line 118, in economist_parse_index
raise Exception('Could not find any articles. Has your subscription expired?')
Exception: Could not find any articles. Has your subscription expired?
Attached Files
File Type: txt economist_old_issues.txt (5.2 KB, 435 views)
ang002 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Older matrix available somewhere? JP36 Feedback 5 01-12-2011 06:29 PM
Downloading v.1.4 firmware from new Cybook to older one maninlek Bookeen 6 12-27-2009 12:12 PM
Downloading previous issues of Newsweek kbfprivate Calibre 6 05-07-2009 11:58 PM
Older version 0.5.6? thibaulthalpern Calibre 4 04-18-2009 07:18 PM


All times are GMT -4. The time now is 07:46 AM.


MobileRead.com is a privately owned, operated and funded community.