View Single Post
Old 11-12-2012, 09:45 PM   #1
Frescard
Enthusiast
Frescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with othersFrescard plays well with others
 
Posts: 29
Karma: 2910
Join Date: Aug 2012
Location: Montreal, Canada
Device: Sony PRS-T1
The New Yorker - fixed cover URL

After many weeks of experimenting, I finally found a reliable way of getting the current cover for The New Yorker:
Code:
def get_cover_url(self):
        cover_url = "http://www.newyorker.com/images/covers/1925/1925_02_21_p233.jpg"
        soup = self.index_to_soup('http://www.newyorker.com/magazine?intcid=magazine')
        cover_item = soup.find('div',attrs={'id':'media-count-1'})
        if cover_item:
           cover_url = 'http://www.newyorker.com' + cover_item.div.img['src'].strip()
        return cover_url
Also, for those who don't actually live in New York, the "Going On" articles are probably pretty useless, and only clutter up the download, so here's a version that suppresses those articles:
Spoiler:
Code:
__license__   = 'GPL v3'
__copyright__ = '2008-2011, Darko Miletic <darko.miletic at gmail.com>'
'''
newyorker.com
'''

from calibre.web.feeds.news import BasicNewsRecipe

class NewYorker(BasicNewsRecipe):
    title                 = 'The New Yorker'
    __author__            = 'Darko Miletic'
    description           = 'Free Articles'
    oldest_article        = 7
    language              = 'en'
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    publisher             = 'Conde Nast Publications'
    category              = 'news'
    encoding              = 'cp1252'
    publication_type    = 'magazine'
    masthead_url        = 'http://www.newyorker.com/css/i/hed/logo.gif'
    extra_css             = """
                                body {font-family: "Times New Roman",Times,serif}
                                .articleauthor{color: #9F9F9F; 
                                               font-family: Arial, sans-serif;
                                               font-size: small; 
                                               text-transform: uppercase}
                                .rubric,.dd,h6#credit{color: #CD0021;
                                        font-family: Arial, sans-serif;
                                        font-size: small;
                                        text-transform: uppercase}
                                .descender:first-letter{display: inline; font-size: xx-large; font-weight: bold}
                                .dd,h6#credit{color: gray}
                                .c{display: block}
                                .caption,h2#articleintro{font-style: italic}
                                .caption{font-size: small}
                            """

    conversion_options = {
                          'comment'   : description
                        , 'tags'      : category
                        , 'publisher' : publisher
                        , 'language'  : language
                        }

    keep_only_tags = [
                        dict(name='div', attrs={'class':'headers'})
                       ,dict(name='div', attrs={'id':['articleheads','items-container','articleRail','articletext','photocredits']})
                     ]
    remove_tags    = [
                         dict(name=['meta','iframe','base','link','embed','object'])
                        ,dict(attrs={'class':['utils','socialUtils','articleRailLinks','icons'] })
                        ,dict(attrs={'id':['show-header','show-footer'] })
                     ]
    remove_attributes = ['lang']
    feeds             = [
		(u'Reporting', u'http://www.newyorker.com/services/mrss/feeds/reporting.xml'),
		(u'Arts', u'http://www.newyorker.com/services/mrss/feeds/arts.xml'),
		(u'Humor',u'http://www.newyorker.com/services/mrss/feeds/humor.xml'),
		(u'Culture', u'http://www.newyorker.com/online/blogs/culture/rss.xml')
	]

    # remove unwanted articles
    def parse_feeds(self):

        # Call parent's method.
        feeds = BasicNewsRecipe.parse_feeds(self)

        # Loop through all feeds.
        for feed in feeds:

            # Loop through all articles in feed.
            for article in feed.articles[:]:

                # No "Goings on about town" articles
                if 'GOINGS ON' in article.title.upper():
                    feed.articles.remove(article)

        return feeds


    def print_version(self, url):
        return url + '?printable=true'

    def image_url_processor(self, baseurl, url):
        return url.strip()

    def get_cover_url(self):
        cover_url = "http://www.newyorker.com/images/covers/1925/1925_02_21_p233.jpg"
        soup = self.index_to_soup('http://www.newyorker.com/magazine?intcid=magazine')
        cover_item = soup.find('div',attrs={'id':'media-count-1'})
        if cover_item:
           cover_url = 'http://www.newyorker.com' + cover_item.div.img['src'].strip()
        return cover_url

    def preprocess_html(self, soup):
        for item in soup.findAll(style=True):
            del item['style']
        auth = soup.find(attrs={'id':'articleauthor'})
        if auth:
           alink = auth.find('a')
           if alink and alink.string is not None:
              txt = alink.string
              alink.replaceWith(txt)
        return soup

To use this, instead of the default recipe, follow these steps:
  • Click on the arrow next to the "Fetch news" icon
  • Select "Add a custom news source"
  • Click the "Customize builtin recipe" button
  • Select the entry for The New Yorker, and click OK
  • Select the new entry from the left "user recipe" listbox
  • Replace the code now listed on the right side with the one posted above (beneath the spoiler tag)
  • Click the "Add/Update recipe" button
  • Confirm replacement with "Yes"
  • Close the custom recipe window, and confirm with "Yes"
The modified recipe will now show up under "Custom" in the news download section, and can be scheduled from there (after which it will also be shown in the "Scheduled" group).
Frescard is offline   Reply With Quote