Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-28-2012, 11:20 PM   #1
rainrdx
Connoisseur
rainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy blue
 
Posts: 54
Karma: 13316
Join Date: Jul 2012
Device: iPad
Harper's Print Edition recipe update

This is an update on Darko Miletic's great work. What I worked on include updates on cover image processing, getting the current issue and some misc/minor things.

I changed the title to Harper's Magazine - Print Edition to fork mostly as I didn't get a chance to communicate with the original author.

R


Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2012, Darko Miletic <darko.miletic at gmail.com>'
'''
harpers.org - paid subscription/ printed issue articles
This recipe only get's article's published in text format
images and pdf's are ignored
If you have institutional subscription based on access IP you do not need to enter
anything in username/password fields
'''

import time
import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class Harpers_full(BasicNewsRecipe):
    title                 = "Harper's Magazine - Printed Edition"
    __author__            = 'Darko Miletic'
    description           = "Harper's Magazine, the oldest general-interest monthly in America, explores the issues that drive our national conversation, through long-form narrative journalism and essays, and such celebrated features as the iconic Harper's Index."
    publisher             = "Harpers's"
    category              = 'news, politics, USA'
    oldest_article        = 30
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    delay                 = 1
    language              = 'en'
    encoding              = 'utf8'
    needs_subscription    = 'optional'
    masthead_url          = 'http://harpers.org/wp-content/themes/harpers/images/pheader.gif'
    publication_type      = 'magazine'
    INDEX                 = ''
    LOGIN                 = 'http://harpers.org/wp-content/themes/harpers/ajax_login.php'
    extra_css             = """
                                body{font-family: adobe-caslon-pro,serif}
                                .category{font-size: small}
                                .articlePost p:first-letter{display: inline; font-size: xx-large; font-weight: bold}
                            """

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language
                        }

    keep_only_tags = [ dict(name='div', attrs={'class':['postdetailFull','articlePost']}) ]
    remove_tags = [
                     dict(name='div', attrs={'class':'fRight rightDivPad'})
                    ,dict(name=['link','meta','object','embed','iframe'])
                  ]
    remove_attributes=['xmlns']

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        br.open('http://harpers.org/')
        if self.username is not None and self.password is not None:
            tt = time.localtime()*1000
            data = urllib.urlencode({ 'm':self.username
                                     ,'p':self.password
                                     ,'rt':'http://harpers.org/'
                                     ,'tt':tt
                                   })
            br.open(self.LOGIN, data)
        return br

    def parse_index(self):
	#find current issue

	soup = self.index_to_soup('http://harpers.org/')
	currentIssue=soup.find('div',attrs={'class':'mainNavi'}).find('li',attrs={'class':'curentIssue'})
	currentIssue_url=self.tag_to_string(currentIssue.a['href'])
	self.log(currentIssue_url)

	#go to the current issue
	soup1 = self.index_to_soup(currentIssue_url)
	date = re.split('\s\|\s',self.tag_to_string(soup1.head.title.string))[0]
	self.timefmt =  u' [%s]'%date

	#get cover
	coverurl='http://harpers.org/wp-content/themes/harpers/ajax_microfiche.php?img=harpers-'+re.split('harpers.org/',currentIssue_url)[1]+'gif/0001.gif'
	soup2 = self.index_to_soup(coverurl)
	self.cover_url = self.tag_to_string(soup2.find('img')['src'])
	self.log(self.cover_url)
        articles = []
        count = 0
        for item in soup1.findAll('div', attrs={'class':'articleData'}):
            text_links = item.findAll('h2')
            for text_link in text_links:
                if count == 0:
                   count = 1
                else:
                   url   = text_link.a['href']
                   title = text_link.a.contents[0]
                   date  = strftime(' %B %Y')
                   articles.append({
                                      'title'      :title
                                     ,'date'       :date
                                     ,'url'        :url
                                     ,'description':''
                                    })
        return [(soup1.head.title.string, articles)]

    def print_version(self, url):
        return url + '?single=1'

    def cleanup(self):
	soup = self.index_to_soup('http://harpers.org/')
	signouturl=self.tag_to_string(soup.find('li', attrs={'class':'subLogOut'}).findNext('li').a['href'])
	self.log(signouturl)
        self.browser.open(signouturl)
rainrdx is offline   Reply With Quote
Old 12-29-2012, 11:51 AM   #2
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 779
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
Dude I already posted this update to calibre bug tracker.
kiklop74 is offline   Reply With Quote
Old 03-25-2013, 06:30 PM   #3
rainrdx
Connoisseur
rainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy blue
 
Posts: 54
Karma: 13316
Join Date: Jul 2012
Device: iPad
Update: fixed cover image.

Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2012, Darko Miletic <darko.miletic at gmail.com>'
'''
harpers.org - paid subscription/ printed issue articles
This recipe only get's article's published in text format
images and pdf's are ignored
If you have institutional subscription based on access IP you do not need to enter
anything in username/password fields
'''

import time
import urllib
from calibre import strftime
from calibre.web.feeds.news import BasicNewsRecipe

class Harpers_full(BasicNewsRecipe):
    title                 = "Harper's Magazine - Printed Edition"
    __author__            = 'Darko Miletic'
    description           = "Harper's Magazine, the oldest general-interest monthly in America, explores the issues that drive our national conversation, through long-form narrative journalism and essays, and such celebrated features as the iconic Harper's Index."
    publisher             = "Harpers's"
    category              = 'news, politics, USA'
    oldest_article        = 30
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    delay                 = 1
    language              = 'en'
    encoding              = 'utf8'
    needs_subscription    = 'optional'
    masthead_url          = 'http://harpers.org/wp-content/themes/harpers/images/pheader.gif'
    publication_type      = 'magazine'
    INDEX                 = ''
    LOGIN                 = 'http://harpers.org/wp-content/themes/harpers/ajax_login.php'
    extra_css             = """
                                body{font-family: adobe-caslon-pro,serif}
                                .category{font-size: small}
                                .articlePost p:first-letter{display: inline; font-size: xx-large; font-weight: bold}
                            """

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language
                        }

    keep_only_tags = [ dict(name='div', attrs={'class':['postdetailFull','articlePost']}) ]
    remove_tags = [
                     dict(name='div', attrs={'class':'fRight rightDivPad'})
                    ,dict(name=['link','meta','object','embed','iframe'])
                  ]
    remove_attributes=['xmlns']

    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        br.open('http://harpers.org/')
        if self.username is not None and self.password is not None:
            tt = time.localtime()*1000
            data = urllib.urlencode({ 'm':self.username
                                     ,'p':self.password
                                     ,'rt':'http://harpers.org/'
                                     ,'tt':tt
                                   })
            br.open(self.LOGIN, data)
        return br

    def parse_index(self):
	#find current issue

	soup = self.index_to_soup('http://harpers.org/')
	currentIssue=soup.find('div',attrs={'class':'mainNavi'}).find('li',attrs={'class':'curentIssue'})
	currentIssue_url=self.tag_to_string(currentIssue.a['href'])

	#go to the current issue
	soup1 = self.index_to_soup(currentIssue_url)
	date = re.split('\s\|\s',self.tag_to_string(soup1.head.title.string))[0]
	self.timefmt =  u' [%s]'%date

	#get cover
	self.cover_url = soup1.find('div', attrs = {'class':'picture_hp'}).find('img', src=True)['src']

        articles = []
        count = 0
        for item in soup1.findAll('div', attrs={'class':'articleData'}):
            text_links = item.findAll('h2')
            for text_link in text_links:
                if count == 0:
                   count = 1
                else:
                   url   = text_link.a['href']
                   title = text_link.a.contents[0]
                   date  = strftime(' %B %Y')
                   articles.append({
                                      'title'      :title
                                     ,'date'       :date
                                     ,'url'        :url
                                     ,'description':''
                                    })
        return [(soup1.head.title.string, articles)]

    def print_version(self, url):
        return url + '?single=1'

    def cleanup(self):
	soup = self.index_to_soup('http://harpers.org/')
	signouturl=self.tag_to_string(soup.find('li', attrs={'class':'subLogOut'}).findNext('li').a['href'])
	self.log(signouturl)
        self.browser.open(signouturl)
rainrdx is offline   Reply With Quote
Old 03-29-2013, 09:17 PM   #4
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 779
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
Please stop posting code with cleanup part. There is absolutely no need to perform logout. Just a waste of resources.
kiklop74 is offline   Reply With Quote
Old 03-30-2013, 06:44 AM   #5
kiklop74
Guru
kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.kiklop74 can program the VCR without an owner's manual.
 
kiklop74's Avatar
 
Posts: 779
Karma: 194642
Join Date: Dec 2007
Location: Argentina
Device: Kindle PaperWhite, Motorola Xoom
Also your update, again, had older version of the code. In future either make edit on the version of the recipe shipped with calibre (not the custom one you have) or just sumbit the changes to me and I'll do it.
kiklop74 is offline   Reply With Quote
Old 04-04-2013, 10:38 AM   #6
rainrdx
Connoisseur
rainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy bluerainrdx can differentiate black from dark navy blue
 
Posts: 54
Karma: 13316
Join Date: Jul 2012
Device: iPad
Sure I will. Sorry.
rainrdx is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help: FT UK print edition not downloading Harrydogood Recipes 0 02-11-2012 11:19 AM
Financial Times Print Edition - Sub sections ratulb Recipes 1 11-27-2010 08:05 AM
Financial Times / FT - help creating a UK print edition recipe ndeb123 Recipes 1 09-29-2010 10:55 AM
Problems with RSS feeds conversion (URLpath not different in the print edition) DerOberdada Calibre 2 01-21-2010 12:37 PM


All times are GMT -4. The time now is 01:45 PM.


MobileRead.com is a privately owned, operated and funded community.