03-15-2011, 06:17 PM | #1 |
Junior Member
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: Kindle
|
Downloading older Economist issues
I have a subscription to The Economist, which gives me access to historical print issues.
I wanted to create a customized recipe based on the built-in recipe that would allow me to pass a specific date format to the INDEX in order to generate the correct url to retrieve the older issues. I know the URL I need to set INDEX to, but what I'm struggling with is whether there is a way to get calibre to prompt me for the specific date I'm interested in. I looked at the NY Times recipe because it has a login section, but I'm still trying to figure out how to accomplish this. Can anyone help? I'm still learning how to customize recipes. Thanks in advance. |
03-15-2011, 06:57 PM | #2 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Just overload the username field pass in the date/issue number in the username and have the login code strip it out.
|
Advert | |
|
03-17-2011, 01:32 PM | #3 |
Junior Member
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: Kindle
|
OK, thanks for the tip.
I'm a bit over my head on this one, I'm still learning calibre and never programmed in python (but I understand objects, etc). I cannot get self.username to be recognized. So could you or someone feed me a bit more guidance? In the The Economist recipe, there is an attribute/variable that holds the index url (INDEX). I need to concatenate the contents of self.username (which I would put a date in the format of 20110305 for example) to INDEX. So I need to end up with INDEX + self.username, but I'm struggling where to accomplish that in the recipe. Thanks for any help you can provide. (BTW, I do have the attribute of needs_subscription set to True, and I do get the prompt on the calibre interface). |
03-17-2011, 01:40 PM | #4 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Look at economist_parse_index
|
03-21-2011, 08:29 PM | #5 |
Junior Member
Posts: 9
Karma: 10
Join Date: Feb 2011
Device: Kindle
|
Thanks for your help.
I've finally been able to get this recipe to accomplish what I wanted. My apologies if I'm violating python/calibre good practices, but I'd figure I'd share this in case someone else has a digital or print subscription to The Economist and wants to download older issues. In the username field, enter a date in the format YYYYMMDD, and in the password field just enter a single character. The date must correspond to the issue date on The Economist issue. (Also, I believe you need to be logged into your Economist account on a browser first, so that a cookie gets created and remembers you, but I'm not 100% sure this is the way it works). In any case, QUESTION: is there a way to set the Publish Date on the recipe to the actual issue date? Anyhow, here it is in case someone wants it. Cheers. ================== Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2008, Kovid Goyal <kovid at kovidgoyal.net>' ''' economist.com ''' from calibre.web.feeds.news import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup from calibre.ebooks.BeautifulSoup import Tag, NavigableString import string, time, re class Economist(BasicNewsRecipe): title = 'The Economist (old issues)' language = 'en' __author__ = "Kovid Goyal" INDEX = 'http://www.economist.com/printedition/index.cfm?d=' description = ' - Global news and current affairs from a European perspective.' oldest_article = 7.0 economist_cover_url = 'http://www.economist.com/images/images-magazine/' cover_url = None cover_suffix = '_CNA400.jpg' remove_tags = [ dict(name=['script', 'noscript', 'title', 'iframe', 'cf_floatingcontent']), dict(attrs={'class':['dblClkTrk', 'ec-article-info', 'share_inline_header']}), {'class': lambda x: x and 'share-links-header' in x}, ] keep_only_tags = [dict(id='ec-article-body')] needs_subscription = True no_stylesheets = True preprocess_regexps = [(re.compile('</html>.*', re.DOTALL), lambda x:'</html>')] ''' def get_browser(self): br = BasicNewsRecipe.get_browser() br.open('http://www.economist.com') req = mechanize.Request( 'http://www.economist.com/members/members.cfm?act=exec_login', headers = { 'Referer':'http://www.economist.com/', }, data=urllib.urlencode({ 'logging_in' : 'Y', 'returnURL' : '/', 'email_address': self.username, 'fakepword' : 'Password', 'pword' : self.password, 'x' : '0', 'y' : '0', })) br.open(req).read() return br ''' def get_cover_url(self): issuedate = self.username self.log.info('Issuedate is: ' + issuedate) iyr = issuedate[0:4] imo = issuedate[4:-2] idt = issuedate[6:] self.cover_url = self.economist_cover_url + iyr + '/' + imo + '/' + idt + '/CN/' + issuedate + self.cover_suffix self.log.info('Cover url is: ' + self.cover_url) return self.cover_url def get_masthead_title(self): issuedate = self.username iyr = issuedate[0:4] imo = issuedate[4:-2] idt = issuedate[6:] self.title = 'The Economist (' + iyr + '/' + imo + '/' + idt + ')' return self.title def parse_index(self): try: return self.economist_parse_index() except: raise self.log.warn( 'Initial attempt to parse index failed, retrying in 30 seconds') time.sleep(30) return self.economist_parse_index() def economist_parse_index(self): self.INDEX = self.INDEX + self.username soup = BeautifulSoup(self.browser.open(self.INDEX).read(), convertEntities=BeautifulSoup.HTML_ENTITIES) index_started = False feeds = {} ans = [] key = None for tag in soup.findAll(['h1', 'h2']): text = ''.join(tag.findAll(text=True)) if tag.name in ('h1', 'h2') and 'Classified ads' in text: break if tag.name == 'h1': if 'The world this week' in text or 'The world this year' in text: index_started = True if not index_started: continue text = string.capwords(text) if text not in feeds.keys(): feeds[text] = [] if text not in ans: ans.append(text) key = text continue if key is None: continue a = tag.find('a', href=True) if a is not None: url=a['href'] id_ = re.search(r'story_id=(\d+)', url).group(1) url = 'http://www.economist.com/node/%s/print'%id_ if url.startswith('Printer'): url = '/'+url if url.startswith('/'): url = 'http://www.economist.com' + url try: subtitle = tag.previousSibling.contents[0].contents[0] text = subtitle + ': ' + text except: pass article = dict(title=text, url = url, description='', content='', date='') feeds[key].append(article) ans = [(key, feeds[key]) for key in ans if feeds.has_key(key)] if not ans: raise Exception('Could not find any articles. Has your subscription expired?') return ans def eco_find_image_tables(self, soup): for x in soup.findAll('table', align=['right', 'center']): if len(x.findAll('font')) in (1,2) and len(x.findAll('img')) == 1: yield x def postprocess_html(self, soup, first): body = soup.find('body') for name, val in body.attrs: del body[name] for table in list(self.eco_find_image_tables(soup)): caption = table.find('font') img = table.find('img') div = Tag(soup, 'div') div['style'] = 'text-align:left;font-size:70%' ns = NavigableString(self.tag_to_string(caption)) div.insert(0, ns) div.insert(1, Tag(soup, 'br')) del img['width'] del img['height'] img.extract() div.insert(2, img) table.replaceWith(div) return soup Last edited by kovidgoyal; 03-21-2011 at 08:49 PM. |
Advert | |
|
08-30-2011, 05:17 AM | #6 |
Junior Member
Posts: 1
Karma: 10
Join Date: Aug 2011
Device: Kindle
|
i am also trying to download older economist issues using calibre and am getting the error message asking if my subscription has expired (it hasn't). downloading the most recent issue works perfectly, but can't figure out how to do older issues. any help would be very much appreciated--wonderful tool you've created!
|
10-15-2011, 05:22 PM | #7 |
Junior Member
Posts: 4
Karma: 10
Join Date: Oct 2011
Device: Kindle 4
|
How do I go about modifying recipes with the OSX version of calibre?
EDIT: After a good 15 minutes of looking everywhere to no avail, I tried a simple "right-click" on the Fetch News button. My bad. |
10-15-2011, 06:45 PM | #8 | |
Junior Member
Posts: 4
Karma: 10
Join Date: Oct 2011
Device: Kindle 4
|
The script provided by partymonkey does not work for me.
It would be very nice to have a standard recipe for fetching older issues. I'm positive it would be a quick fix to add this to the current recipe. EDIT: Quote:
Last edited by Aeon; 10-15-2011 at 07:18 PM. |
|
10-17-2011, 09:23 AM | #9 |
Reader
Posts: 519
Karma: 24612
Join Date: Aug 2009
Location: Utrecht, NL
Device: Kobo Aura 2, iPhone, iPad
|
|
11-04-2011, 11:21 AM | #10 |
Member
Posts: 23
Karma: 10
Join Date: Oct 2011
Device: Kindle
|
The economist recipe does not seem to work for me - is the standard recipe working for others? I notice there are two recipes in builtin. They are different as confirmed by a diff. I used the one with a username/password block but it does not work. I'd like to download the full current issues. If the capability for past issues can be included that would be great! Please do share if you have a working economist recipe.
Thanks in advance! Alex |
11-28-2011, 10:09 AM | #11 |
Junior Member
Posts: 1
Karma: 10
Join Date: Nov 2011
Device: Kindle Touch
|
Also having trouble with previous issues of The Economist
PREFACE: I am completely new at this and have no experience with any kind of computer languages.
SITUATION: I downloaded calibre and inserted the recipe (attached as a *.TXT file through originally a *.RECIPE file) in an attempt to download previous issues of The Economist with my paid subscription for the print edition. Despite my persistence and limited best efforts, I continue to receive the following ERROR message (below). Please note that I've tried downloading content from other news sources to verify that calibre is working on my PC and it is. I've even had perfect success with the current issue of The Economist. It is when attempting to download previous issues where I am hitting a snag. Any help would be much appreciated as I am a new owner of a Kindle and would love to read my only magazine subscription on it! Thank you! calibre, version 0.8.28 ERROR: Conversion Error: <b>Failed</b>: Fetch news from The Economist (old issues) Fetch news from The Economist (old issues) Resolved conversion options calibre version: 0.8.28 {'asciiize': False, 'author_sort': None, 'authors': None, 'base_font_size': 0, 'book_producer': None, 'change_justification': 'original', 'chapter': None, 'chapter_mark': 'pagebreak', 'comments': None, 'cover': None, 'debug_pipeline': None, 'dehyphenate': True, 'delete_blank_paragraphs': True, 'disable_font_rescaling': False, 'dont_compress': False, 'dont_download_recipe': False, 'duplicate_links_in_toc': False, 'enable_heuristics': False, 'extra_css': None, 'extract_to': None, 'filter_css': None, 'fix_indents': True, 'font_size_mapping': None, 'format_scene_breaks': True, 'html_unwrap_factor': 0.4, 'input_encoding': None, 'input_profile': <calibre.customize.profiles.InputProfile object at 0x05743890>, 'insert_blank_line': False, 'insert_blank_line_size': 0.5, 'insert_metadata': False, 'isbn': None, 'italicize_common_cases': True, 'keep_ligatures': False, 'language': None, 'level1_toc': None, 'level2_toc': None, 'level3_toc': None, 'line_height': 0, 'linearize_tables': False, 'lrf': False, 'margin_bottom': 5.0, 'margin_left': 5.0, 'margin_right': 5.0, 'margin_top': 5.0, 'markup_chapter_headings': True, 'max_toc_links': 50, 'minimum_line_height': 120.0, 'mobi_ignore_margins': False, 'mobi_toc_at_start': False, 'no_chapters_in_toc': False, 'no_inline_navbars': True, 'no_inline_toc': False, 'output_profile': <calibre.customize.profiles.KindleOutput object at 0x05743BB0>, 'page_breaks_before': None, 'password': 'LSG_2011tsi', 'personal_doc': '[PDOC]', 'prefer_author_sort': False, 'prefer_metadata_cover': False, 'pretty_print': False, 'pubdate': None, 'publisher': None, 'rating': None, 'read_metadata_from_opf': None, 'remove_fake_margins': True, 'remove_first_image': False, 'remove_paragraph_spacing': False, 'remove_paragraph_spacing_indent_size': 1.5, 'renumber_headings': True, 'replace_scene_breaks': '', 'rescale_images': False, 'series': None, 'series_index': None, 'share_not_sync': False, 'smarten_punctuation': False, 'sr1_replace': '', 'sr1_search': '', 'sr2_replace': '', 'sr2_search': '', 'sr3_replace': '', 'sr3_search': '', 'tags': None, 'test': False, 'timestamp': None, 'title': None, 'title_sort': None, 'toc_filter': None, 'toc_threshold': 6, 'toc_title': None, 'unsmarten_punctuation': False, 'unwrap_lines': True, 'use_auto_toc': False, 'username': 'antonio.octaviani@gmail.com', 'verbose': 2} InputFormatPlugin: Recipe Input running Python function terminated unexpectedly Could not find any articles. Has your subscription expired? (Error Code: 1) Traceback (most recent call last): File "site.py", line 132, in main File "site.py", line 109, in run_entry_point File "site-packages\calibre\utils\ipc\worker.py", line 187, in main File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert File "site-packages\calibre\ebooks\conversion\plumber.py", line 959, in run File "site-packages\calibre\customize\conversion.py", line 204, in __call__ File "site-packages\calibre\web\feeds\input.py", line 105, in convert File "site-packages\calibre\web\feeds\news.py", line 824, in download File "site-packages\calibre\web\feeds\news.py", line 968, in build_index File "c:\users\t&a\appdata\local\temp\calibre_0.8.28_tm p_qjzabk\0gzeze_recipes\recipe0.py", line 63, in parse_index return self.economist_parse_index() File "c:\users\t&a\appdata\local\temp\calibre_0.8.28_tm p_qjzabk\0gzeze_recipes\recipe0.py", line 118, in economist_parse_index raise Exception('Could not find any articles. Has your subscription expired?') Exception: Could not find any articles. Has your subscription expired? |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Older matrix available somewhere? | JP36 | Feedback | 5 | 01-12-2011 06:29 PM |
Downloading v.1.4 firmware from new Cybook to older one | maninlek | Bookeen | 6 | 12-27-2009 12:12 PM |
Downloading previous issues of Newsweek | kbfprivate | Calibre | 6 | 05-07-2009 11:58 PM |
Older version 0.5.6? | thibaulthalpern | Calibre | 4 | 04-18-2009 07:18 PM |