![]() |
#1 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 55
Karma: 13316
Join Date: Jul 2012
Device: iPad
|
Time Magazine Recipe
I kinda wondered why this recipe is broken for a long while so I took some time and fixed it up.
Code:
#!/usr/bin/env python __license__ = 'GPL v3' __copyright__ = '2008, Kovid Goyal <kovid@kovidgoyal.net>' ''' time.com ''' import re from calibre.web.feeds.news import BasicNewsRecipe from lxml import html class Time(BasicNewsRecipe): #recipe_disabled = ('This recipe has been disabled as TIME no longer' # ' publish complete articles on the web.') title = u'Time' __author__ = 'Kovid Goyal, Rick Shang' description = ('Weekly US magazine.') encoding = 'utf-8' no_stylesheets = True language = 'en' remove_javascript = True needs_subscription = True keep_only_tags = [ { 'class':['tout1', 'entry-content', 'external-gallery-img', 'image-meta'] }, ] remove_tags = [ {'class':['thumbnail', 'button']}, ] recursions = 10 match_regexps = [r'/[0-9,]+-(2|3|4|5|6|7|8|9)(,\d+){0,1}.html',r'http://www.time.com/time/specials/packages/article/.*'] preprocess_regexps = [(re.compile( r'<meta .+/>'), lambda m:'')] def get_browser(self): br = BasicNewsRecipe.get_browser(self) # This site uses javascript in its login process if self.username is not None and self.password is not None: br.open('http://www.time.com/time/magazine') br.select_form(predicate=lambda f: 'action' in f.attrs and f.attrs['action'] == 'https://auth.time.com/login.php') br['username'] = self.username br['password'] = self.password br['magcode'] = ['TD'] br.find_control('turl').readonly = False br['turl'] = 'http://www.time.com/time/magazine' br.find_control('rurl').readonly = False br['rurl'] = 'http://www.time.com/time/magazine' br['remember'] = False br.submit() return br def parse_index(self): raw = self.index_to_soup('http://www.time.com/time/magazine', raw=True) soup = self.index_to_soup('http://www.time.com/time/magazine') dates = self.tag_to_string(soup.find('time', attrs={'class':'updated'})) self.timefmt = ' [%s]'%dates root = html.fromstring(raw) img = root.xpath('//a[.="View Large Cover" and @href]') if img: cover_url = 'http://www.time.com' + img[0].get('href') try: nsoup = self.index_to_soup(cover_url) img = nsoup.find('img', src=re.compile('archive/covers')) if img is not None: self.cover_url = img['src'] except: self.log.exception('Failed to fetch cover') feeds = [] parent = root.xpath('//div[@class="content-main-aside"]')[0] for sec in parent.xpath( 'descendant::section[contains(@class, "sec-mag-section")]'): h3 = sec.xpath('./h3') if h3: section = html.tostring(h3[0], encoding=unicode, method='text').strip().capitalize() self.log('Found section', section) articles = list(self.find_articles(sec)) if articles: feeds.append((section, articles)) return feeds def find_articles(self, sec): for article in sec.xpath('./article'): h2 = article.xpath('./*[@class="entry-title"]') if not h2: continue a = h2[0].xpath('./a[@href]') if not a: continue title = html.tostring(a[0], encoding=unicode, method='text').strip() if not title: continue url = re.sub('/magazine/article/0,9171','/subscriber/printout/0,8816',a[0].get('href')) if url.startswith('/'): url = 'http://www.time.com'+url desc = '' p = article.xpath('./*[@class="entry-content"]') if p: desc = html.tostring(p[0], encoding=unicode, method='text') self.log('\t', title, ':\n\t\t', desc) yield { 'title' : title, 'url' : url, 'date' : '', 'description' : desc } Last edited by rainrdx; 08-24-2012 at 01:20 PM. Reason: A minor update |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,160
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Your log in code doesn't seem to work. You can check for a successful login with
Code:
raw = br.submit().read() if '>Log Out<' not in raw: raise ValueError('Failed to login to time.com, check' ' your username and password') |
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 55
Karma: 13316
Join Date: Jul 2012
Device: iPad
|
Quote:
My log in code does not seem to work by your code, but when I use the browser instance to open the subscription only page, ("subscriber/printout"), the subscription only context does come up. It's probably some cookie issue that I am unable to tackle. But for practical purposes I find it's ok. |
|
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,160
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Does it work even if you supply an incorrect username/password?
|
![]() |
![]() |
![]() |
#5 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 55
Karma: 13316
Join Date: Jul 2012
Device: iPad
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,160
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
OK, I've updated the recipe, but I doubt that the numbers in the url you replace will remain stable over time.
|
![]() |
![]() |
![]() |
#7 |
Connoisseur
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 55
Karma: 13316
Join Date: Jul 2012
Device: iPad
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe for Microwave and RF magazine | kiavash | Recipes | 1 | 03-14-2012 07:32 PM |
Time Magazine | figgarro | Recipes | 1 | 10-22-2011 09:54 PM |
Recipe for Caijing Magazine (zh-CN) | gzeric | Recipes | 2 | 08-19-2011 04:59 PM |
TIME magazine recipe broken? | alexxx | Recipes | 0 | 07-12-2011 03:45 AM |
Recipe for Self Magazine (US) (need help) | xXxXxXxXxXx | Recipes | 1 | 05-17-2011 05:16 PM |