Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 11-12-2012, 09:45 PM   #1
Frescard
Enthusiast
Frescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with others
 
Posts: 29
Karma: 2910
Join Date: Aug 2012
Location: Montreal, Canada
Device: Sony PRS-T1
The New Yorker - fixed cover URL

After many weeks of experimenting, I finally found a reliable way of getting the current cover for The New Yorker:
Code:
def get_cover_url(self):
        cover_url = "http://www.newyorker.com/images/covers/1925/1925_02_21_p233.jpg"
        soup = self.index_to_soup('http://www.newyorker.com/magazine?intcid=magazine')
        cover_item = soup.find('div',attrs={'id':'media-count-1'})
        if cover_item:
           cover_url = 'http://www.newyorker.com' + cover_item.div.img['src'].strip()
        return cover_url
Also, for those who don't actually live in New York, the "Going On" articles are probably pretty useless, and only clutter up the download, so here's a version that suppresses those articles:
Spoiler:
Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2011, Darko Miletic <darko.miletic at gmail.com>'
'''
newyorker.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class NewYorker(BasicNewsRecipe):
    title                 = 'The New Yorker'
    __author__            = 'Darko Miletic'
    description           = 'Free Articles'
    oldest_article        = 7
    language              = 'en'
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    publisher             = 'Conde Nast Publications'
    category              = 'news'
    encoding              = 'cp1252'
    publication_type    = 'magazine'
    masthead_url        = 'http://www.newyorker.com/css/i/hed/logo.gif'
    extra_css             = """
                                body {font-family: "Times New Roman",Times,serif}
                                .articleauthor{color: #9F9F9F; 
                                               font-family: Arial, sans-serif;
                                               font-size: small; 
                                               text-transform: uppercase}
                                .rubric,.dd,h6#credit{color: #CD0021;
                                        font-family: Arial, sans-serif;
                                        font-size: small;
                                        text-transform: uppercase}
                                .descender:first-letter{display: inline; font-size: xx-large; font-weight: bold}
                                .dd,h6#credit{color: gray}
                                .c{display: block}
                                .caption,h2#articleintro{font-style: italic}
                                .caption{font-size: small}
                            """

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language
                        }

    keep_only_tags = [
                        dict(name='div', attrs={'class':'headers'})
                       ,dict(name='div', attrs={'id':['articleheads','items-container','articleRail','articletext','photocredits']})
                     ]
    remove_tags    = [
                         dict(name=['meta','iframe','base','link','embed','object'])
                        ,dict(attrs={'class':['utils','socialUtils','articleRailLinks','icons'] })
                        ,dict(attrs={'id':['show-header','show-footer'] })
                     ]
    remove_attributes = ['lang']
    feeds             = [
		(u'Reporting', u'http://www.newyorker.com/services/mrss/feeds/reporting.xml'),
		(u'Arts', u'http://www.newyorker.com/services/mrss/feeds/arts.xml'),
		(u'Humor',u'http://www.newyorker.com/services/mrss/feeds/humor.xml'),
		(u'Culture', u'http://www.newyorker.com/online/blogs/culture/rss.xml')
	]

    # remove unwanted articles
    def parse_feeds(self):

        # Call parent's method.
        feeds = BasicNewsRecipe.parse_feeds(self)

        # Loop through all feeds.
        for feed in feeds:

            # Loop through all articles in feed.
            for article in feed.articles[:]:

                # No "Goings on about town" articles
                if 'GOINGS ON' in article.title.upper():
                    feed.articles.remove(article)

        return feeds


    def print_version(self, url):
        return url + '?printable=true'

    def image_url_processor(self, baseurl, url):
        return url.strip()

    def get_cover_url(self):
        cover_url = "http://www.newyorker.com/images/covers/1925/1925_02_21_p233.jpg"
        soup = self.index_to_soup('http://www.newyorker.com/magazine?intcid=magazine')
        cover_item = soup.find('div',attrs={'id':'media-count-1'})
        if cover_item:
           cover_url = 'http://www.newyorker.com' + cover_item.div.img['src'].strip()
        return cover_url

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        auth = soup.find(attrs={'id':'articleauthor'})
        if auth:
           alink = auth.find('a')
           if alink and alink.string is not None:
              txt = alink.string
              alink.replaceWith(txt)
        return soup

To use this, instead of the default recipe, follow these steps:
  • Click on the arrow next to the "Fetch news" icon
  • Select "Add a custom news source"
  • Click the "Customize builtin recipe" button
  • Select the entry for The New Yorker, and click OK
  • Select the new entry from the left "user recipe" listbox
  • Replace the code now listed on the right side with the one posted above (beneath the spoiler tag)
  • Click the "Add/Update recipe" button
  • Confirm replacement with "Yes"
  • Close the custom recipe window, and confirm with "Yes"
The modified recipe will now show up under "Custom" in the news download section, and can be scheduled from there (after which it will also be shown in the "Scheduled" group).
Frescard is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
London Review of Books - fixed cover URL Frescard Recipes 0 11-05-2012 07:54 PM
Kindle 3 M-Edge New Yorker Cover with Light for $9.99 shipped Boston Deals and Resources (No Self-Promotion or Affiliate Links) 44 07-02-2012 03:01 PM
M-Edge New Yorker Cover with Light for $9.99 Gardenman iRiver Story 1 04-17-2012 10:36 AM
Lost trying to get url for time.com cover Olger Recipes 3 02-14-2012 09:04 AM
New Yorker Magazine Cover - Fiction Edition poohbear_nc News 1 06-06-2009 04:56 AM


All times are GMT -4. The time now is 10:44 PM.


MobileRead.com is a privately owned, operated and funded community.