Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-26-2012, 12:14 AM   #1
Dreading
Member
Dreading began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2011
Device: Fire and Sony Pocket E-Reader
Recipe Request: High Country News

Hi, I would love to get a recipe for High Country News.

http://www.hcn.org/rss

I'm not currently a paid digital subscriber, but I have been in the past and would be willing to be again if someone was willing to write a recipe.

Thanks!
Dreading is offline   Reply With Quote
Old 01-28-2012, 11:07 AM   #2
Divingduck
Fanatic
Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.
 
Posts: 557
Karma: 59934
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Let me know if this is what your're looking for.

Spoiler:
Code:
# -*- coding: utf-8 -*-
__license__   = 'GPL v3'
__copyright__ = '2012, Kovid Goyal <kovid at kovidgoyal.net>, Armin Geller'

'''
Fetch High Country News
'''
from calibre.web.feeds.news import BasicNewsRecipe
class HighCountryNews(BasicNewsRecipe):

    title = u'High Country News'
    description = u'High Country News (RSS Version)'
    __author__ = 'Armin Geller' # 2012-01-28
    publisher = 'High Country News'
    category = 'news, politics, Germany'
    timefmt  = ' [%a, %d %b %Y]'
    language = 'en-Us'
    encoding = 'UTF-8'
    publication_type      = 'newspaper'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets = True 
    auto_cleanup = True
    remove_javascript = True
    use_embedded_content  = False  

    
    feeds = [
              (u'Most recent', u'http://feeds.feedburner.com/hcn/most-recent'),
              (u'Current Issue', u'http://feeds.feedburner.com/hcn/current-issue'),

              (u'Writers on the Range', u'http://feeds.feedburner.com/hcn/wotr'),
              (u'High Country Views', u'http://feeds.feedburner.com/hcn/HighCountryViews'),
             ]
 
    def print_version(self, url):
          return url + '/print_view'


If it's wroking for you we can ask Kovid for impementation. Let me know.
Attached Files
File Type: zip HighCountryNews_AGe.zip (732 Bytes, 54 views)
Divingduck is offline   Reply With Quote
Old 01-30-2012, 12:16 AM   #3
Dreading
Member
Dreading began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2011
Device: Fire and Sony Pocket E-Reader
Thanks so much, I really appreciate it. Let me know if I can pay it back or forward in any way.

Last edited by Dreading; 01-30-2012 at 12:27 AM. Reason: I had originally messed up with adding it somehow, but I tried again and it looks great on my Calibre E-reader and my Fire.
Dreading is offline   Reply With Quote
Old 01-30-2012, 02:10 AM   #4
Divingduck
Fanatic
Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.
 
Posts: 557
Karma: 59934
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
You're welcome. I learned a bit about your country while I made the recipe. I guess tomorrow you will become an update. The new one will have a cover.
Divingduck is offline   Reply With Quote
Old 01-31-2012, 02:01 AM   #5
Divingduck
Fanatic
Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.
 
Posts: 557
Karma: 59934
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Here is the final recipe. Let me know if something isn't working or you have questions.

Spoiler:
Code:
# -*- coding: utf-8 -*-
__license__   = 'GPL v3'
__copyright__ = '2012, Kovid Goyal <kovid at kovidgoyal.net>, Armin Geller'

'''
Fetch High Country News
'''
from calibre.web.feeds.news import BasicNewsRecipe
class HighCountryNews(BasicNewsRecipe):

    title                 = u'High Country News'
    description           = u'High Country News (RSS Version)'
    __author__            = 'Armin Geller' # 2012-01-31
    publisher             = 'High Country News'
    category              = 'news, politics, Germany'
    timefmt               = ' [%a, %d %b %Y]'
    language              = 'en-Us'
    encoding              = 'UTF-8'
    publication_type      = 'newspaper'
    oldest_article        = 7
    max_articles_per_feed = 100
    no_stylesheets        = True 
    auto_cleanup          = True
    remove_javascript     = True
    use_embedded_content  = False  
    masthead_url          = 'http://www.hcn.org/logo.jpg' # 2012-01-31 AGe add
    cover_source          = 'http://www.hcn.org'          # 2012-01-31 AGe add
    
    def get_cover_url(self):                              # 2012-01-31 AGe add
       cover_source_soup = self.index_to_soup(self.cover_source)
       preview_image_div = cover_source_soup.find(attrs={'class':' portaltype-Plone Site content--hcn template-homepage_view'})
       return preview_image_div.div.img['src']
    
    feeds = [
              (u'Most recent', u'http://feeds.feedburner.com/hcn/most-recent'),
              (u'Current Issue', u'http://feeds.feedburner.com/hcn/current-issue'),

              (u'Writers on the Range', u'http://feeds.feedburner.com/hcn/wotr'),
              (u'High Country Views', u'http://feeds.feedburner.com/hcn/HighCountryViews'),
             ]
 
    def print_version(self, url):
          return url + '/print_view'
Attached Files
File Type: zip HighCountryNews_AGeV1.0.zip (925 Bytes, 52 views)
Divingduck is offline   Reply With Quote
Old 02-01-2012, 10:59 AM   #6
Dreading
Member
Dreading began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Dec 2011
Device: Fire and Sony Pocket E-Reader
Hi, I just tested it out, and it works great. I'll let you know if there are any problems, but it looks like I'm good to go. Thanks again!
Dreading is offline   Reply With Quote
Old 08-18-2013, 09:33 AM   #7
Divingduck
Fanatic
Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.
 
Posts: 557
Karma: 59934
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
I made an update for this recipe. The recipe includes now High Country News - Blog. So there is no need to use two recipes for feed content of HCN. In addition I change the method to extract the data. So, some of the articles have pictures back again. As I didn’t found an error since the last 8 weeks, here is the new version:

Spoiler:
Code:
# -*- coding: utf-8 -*-
##
## Written:      2012-01-28
## Last Edited:  2013-08-18
## Remark:       Version 1.2 
##               Integration of former separated Blog-News
##
__license__   = 'GPL v3'
__copyright__ = '2013, Armin Geller'

'''
Fetch High Country News
'''
from calibre.web.feeds.news import BasicNewsRecipe
class HighCountryNews(BasicNewsRecipe):

    title                 = u'High Country News'
    description           = u'High Country News (RSS Version)'
    __author__            = 'Armin Geller'
    publisher             = 'High Country News'
    category              = 'news, politics'
    timefmt               = ' [%a, %d %b %Y]'
    language              = 'en-Us'
    encoding              = 'UTF-8'
    publication_type      = 'newspaper'
    oldest_article        = 14
    max_articles_per_feed = 100
    no_stylesheets        = True 
    auto_cleanup          = False
    remove_javascript     = True
    remove_empty_feeds    = True  # 2013-08-18 AGe add
    use_embedded_content  = False  
    
    masthead_url          = 'http://www.hcn.org/logo.jpg'
    cover_source          = 'http://www.hcn.org'
    
    def get_cover_url(self):
       cover_source_soup = self.index_to_soup(self.cover_source)
       preview_image_div = cover_source_soup.find(attrs={'class':' portaltype-Plone Site content--hcn template-homepage_view'})
       return preview_image_div.div.img['src']

    
    feeds = [
              (u'Most recent', u'http://feeds.feedburner.com/hcn/most-recent?format=xml'),
              (u'Current Issue', u'http://feeds.feedburner.com/hcn/current-issue?format=xml'),
              
              (u'From the Blogs', u'http://feeds.feedburner.com/hcn/FromTheBlogs?format=xml'), # 2013-07-23 AGe add
              (u'Heard around the West', u'http://feeds.feedburner.com/hcn/heard?format=xml'), # 2013-07-23 AGe add
              (u'The GOAT Blog', u'http://feeds.feedburner.com/hcn/goat?format=xml'),          # 2013-07-23 AGe add  
              (u'The Range', u'http://feeds.feedburner.com/hcn/range?format=xml'),             # 2013-07-23 AGe add

              (u'Writers on the Range', u'http://feeds.feedburner.com/hcn/wotr'),
              (u'High Country Views', u'http://feeds.feedburner.com/hcn/HighCountryViews'),
             ]
 
 # 2013-07-23 AGe New coding w/o using print_version
 
    keep_only_tags    = [
                          dict(name='div', attrs={'id':['content']}),
                        ]

    remove_tags = [
                    dict(name='div', attrs={'class':['documentActions supercedeDocumentActions editorialDocumentActions', 
                                                      'documentActions supercedeDocumentActions editorialDocumentActions editorialFooterDocumentActions',
                                                      'article-sidebar',
                                                      'image-viewer-controls nojs',
                                                      'protectedArticleWrapper',
                                                      'visualClear',
                                                     ]})
                  ]
 
    INDEX                 = ''
    def append_page(self, soup, appendtag, position):
        pager = soup.find('span',attrs={'class':'next'})
        print 'AGE-append_page-------------->: ', pager
        if pager:
           nexturl = self.INDEX + pager.a['href']
           print 'AGE--------------->: ', nexturl
           soup2 = self.index_to_soup(nexturl)
           texttag = soup2.find('div', attrs={'class':'article-text'})
           newpos = len(texttag.contents)
           self.append_page(soup2,texttag,newpos)
           texttag.extract()
           appendtag.insert(position,texttag)

    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3)
        pager = soup.find('div',attrs={'class':'listingBar listingBar-article'})
        if pager:
           pager.extract()
        return self.adeify_images(soup)


Some remarks to HCN and this recipe:

HCN isn't very often updating their content especially the Blogs – unfortunately. If you like to see more articles from the past, you need to modify the entry oldest_article = 14 in the recipe to something what is more appropriate for you. 100 (=days) will result in an 8,3MB EPUB with all actual used feeds. I set it to 14 because it seems that this matches better to the updated content. Anyway, you will find out the best setup for your needs. There is also a part in the feed what is called “High Country views” and in there are entries starting with “West of 100: …” These entries are podcasts which HCN decided to discontinue, unfortunately. They are still available in the feed and I didn’t delete this content. So if you are sitting in front of a PC with Calibre-Viewer, you can use the article link to follow the shown podcasts for listening. Keep in mind to extend oldest_article because the oldest audio file is from February 28, 2011. Available are 15 audio files.

Have a nice Sunday
DivingDuck
Attached Files
File Type: zip HighCountryNews_AGeV1.2.zip (1.5 KB, 34 views)
Divingduck is offline   Reply With Quote
Old 09-06-2013, 11:26 AM   #8
Divingduck
Fanatic
Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.
 
Posts: 557
Karma: 59934
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
A new update for this recipe. HCN made some changes.

Spoiler:
Code:
# -*- coding: utf-8 -*-
##
## Written:      2012-01-28
## Last Edited:  2013-09-06
## Remark:       Version 1.3 
##               Update cleanup for new web article design
##
__license__   = 'GPL v3'
__copyright__ = '2013, Armin Geller'

'''
Fetch High Country News
'''
from calibre.web.feeds.news import BasicNewsRecipe
class HighCountryNews(BasicNewsRecipe):

    title                 = u'High Country News'
    description           = u'High Country News (RSS Version)'
    __author__            = 'Armin Geller'
    publisher             = 'High Country News'
    category              = 'news, politics'
    timefmt               = ' [%a, %d %b %Y]'
    language              = 'en-Us'
    encoding              = 'UTF-8'
    publication_type      = 'newspaper'
    oldest_article        = 14
    max_articles_per_feed = 100
    no_stylesheets        = True 
    auto_cleanup          = False
    remove_javascript     = True
    remove_empty_feeds    = True
    use_embedded_content  = False  
    
    masthead_url          = 'http://www.hcn.org/logo.jpg'
    cover_source          = 'http://www.hcn.org'
    
    def get_cover_url(self):
       cover_source_soup = self.index_to_soup(self.cover_source)
       preview_image_div = cover_source_soup.find(attrs={'class':' portaltype-Plone Site content--hcn template-homepage_view'})
       return preview_image_div.div.img['src']

    
    feeds = [
              (u'Most recent', u'http://feeds.feedburner.com/hcn/most-recent?format=xml'),
              (u'Current Issue', u'http://feeds.feedburner.com/hcn/current-issue?format=xml'),
              
              (u'From the Blogs', u'http://feeds.feedburner.com/hcn/FromTheBlogs?format=xml'),
              (u'Heard around the West', u'http://feeds.feedburner.com/hcn/heard?format=xml'),
              (u'The GOAT Blog', u'http://feeds.feedburner.com/hcn/goat?format=xml'),
              (u'The Range', u'http://feeds.feedburner.com/hcn/range?format=xml'),

              (u'Writers on the Range', u'http://feeds.feedburner.com/hcn/wotr'),
              (u'High Country Views', u'http://feeds.feedburner.com/hcn/HighCountryViews'),
             ]
 
 # 2013-07-23 AGe New coding w/o using print_version
 
    keep_only_tags    = [
                          dict(name='div', attrs={'id':['content']}),
                        ]

    remove_tags = [
                    dict(name='div', attrs={'class':['documentActions supercedeDocumentActions editorialDocumentActions', 
                                                      'documentActions supercedeDocumentActions editorialDocumentActions editorialFooterDocumentActions',
                                                      'article-sidebar',
                                                      'image-viewer-controls nojs',
                                                      'protectedArticleWrapper',
                                                      'visualClear',
                                                      'feed-icons', #2013-09-06 AGe add
                                                      'PayWallEmail', #2013-09-06 AGe add
                                                     ]}),
                    dict(name='div', attrs={'id':['offer-below-locked-article']}), #2013-09-06 AGe add                                
                  ]
 
    INDEX                 = ''
    def append_page(self, soup, appendtag, position):
        pager = soup.find('span',attrs={'class':'next'})
        if pager:
           nexturl = self.INDEX + pager.a['href']
           soup2 = self.index_to_soup(nexturl)
           texttag = soup2.find('div', attrs={'class':'article-text'})
           newpos = len(texttag.contents)
           self.append_page(soup2,texttag,newpos)
           texttag.extract()
           appendtag.insert(position,texttag)

    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3)
        pager = soup.find('div',attrs={'class':'listingBar listingBar-article'})
        if pager:
           pager.extract()
        return self.adeify_images(soup)
Attached Files
File Type: zip HighCountryNews_AGeV1.3.zip (1.5 KB, 30 views)
Divingduck is offline   Reply With Quote
Old 09-18-2014, 08:58 AM   #9
Divingduck
Fanatic
Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.Divingduck never is beset by a damp, drizzly November in his or her soul.
 
Posts: 557
Karma: 59934
Join Date: Nov 2010
Location: Germany
Device: Sony PRS-650
Please find attached a new version. HCN have a new web design. I made in addition an extra CSS to get rid of the ugly article design. Hope you will like it.

Spoiler:
Code:
# -*- coding: utf-8 -*-
##
## Written:      2012-01-28
## Last Edited:  2014-09-18
## Remark:       Version 2.0 first check 
##               Update cleanup for new web article design and extra css
##
__license__   = 'GPL v3'
__copyright__ = '2013, Armin Geller'

'''
Fetch High Country News
'''
from calibre.web.feeds.news import BasicNewsRecipe
class HighCountryNews(BasicNewsRecipe):

    title                 = u'High Country News'
    description           = u'High Country News (RSS Version)'
    __author__            = 'Armin Geller'
    publisher             = 'High Country News'
    category              = 'news, politics'
    timefmt               = ' [%a, %d %b %Y]'
    language              = 'en-Us'
    encoding              = 'UTF-8'
    publication_type      = 'newspaper'
    oldest_article        = 14
    max_articles_per_feed = 100
    no_stylesheets        = True 
    auto_cleanup          = False
    remove_javascript     = True
    remove_empty_feeds    = True
    use_embedded_content  = False  
    
    masthead_url          = 'http://www.hcn.org/logo.jpg'
    cover_source          = 'http://www.hcn.org/issues' # AGE 2014-09-18 new
    
    def get_cover_url(self):
       cover_source_soup = self.index_to_soup(self.cover_source)
       preview_image_div = cover_source_soup.find(attrs={'class':'articles'}) # AGE 2014-09-18 new
       return preview_image_div.div.a.figure.img['src'] # AGE 2014-09-18 newm take always the first one (hopefully)

    # AGe new extra css to get rid of ugly style
    # li for delete disc style, 
    # caption and credit for description & author of pictures

    extra_css      =  '''
                      h1 {font-size: 1.6em; text-align: left}
                      h2 {font-size: 1em; font-style: italic; font-weight: normal}
                      h3 {font-size: 1.3em;text-align: left}
                      h4, h5, h6, {font-size: 1em;text-align: left} 
                      li {list-style-type: none}
                      .caption, .credit {font-size: 0.9em; font-style: italic}
                      '''

    feeds = [
              (u'Most recent', u'http://feeds.feedburner.com/hcn/most-recent?format=xml'),
              (u'Current Issue', u'http://feeds.feedburner.com/hcn/current-issue?format=xml'),
              
              (u'From the Blogs', u'http://feeds.feedburner.com/hcn/FromTheBlogs?format=xml'),
              (u'Heard around the West', u'http://feeds.feedburner.com/hcn/heard?format=xml'),
              (u'The GOAT Blog', u'http://feeds.feedburner.com/hcn/goat?format=xml'),
              (u'The Range', u'http://feeds.feedburner.com/hcn/range?format=xml'),

              (u'Writers on the Range', u'http://feeds.feedburner.com/hcn/wotr'),
              (u'High Country Views', u'http://feeds.feedburner.com/hcn/HighCountryViews'),
             ]

    # 2014-09-18 AGe New coding related to design changes
 
    keep_only_tags    = [
                          dict(name='div', attrs={'id':'content'}),
                          dict(name='div', attrs={'class':'opaque'}),
                        ]

    remove_tags = [
                    dict(name='div', attrs={'class':[
																											'large-4 columns right-portlets',
																											'small-12 columns',
																											'pagination-share',
																											'tiny content f-dropdown',
																											'image-viewer-controls',
                                                     ]}),
                    dict(name='ul', attrs={'class':[
																										'document-actions',
																										'topics',
																									]}),
                    dict(name='a', attrs={'name':[
																										'body',
																									]}),

                  ]
 
    # AGE 2014-09-18 this will stay for a while
    # but have no impact for now ... 
    
    INDEX                 = ''
    def append_page(self, soup, appendtag, position):
        pager = soup.find('span',attrs={'class':'next'})
        if pager:
           nexturl = self.INDEX + pager.a['href']
           soup2 = self.index_to_soup(nexturl)
           texttag = soup2.find('div', attrs={'class':'article-text'})
           newpos = len(texttag.contents)
           self.append_page(soup2,texttag,newpos)
           texttag.extract()
           appendtag.insert(position,texttag)

    def preprocess_html(self, soup):
        self.append_page(soup, soup.body, 3)
        pager = soup.find('div',attrs={'class':'listingBar listingBar-article'})
        if pager:
           pager.extract()
        return self.adeify_images(soup)
Attached Files
File Type: zip HighCountryNews_AGeV2.0.zip (1.7 KB, 10 views)
Divingduck is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Update request for Sueddeutsche Zeitung News Recipe Divingduck Recipes 14 12-05-2012 02:46 PM
New Fairbanks Daily News-miner News Recipe -- Need Date inclusion only rogerx Recipes 5 08-24-2011 09:12 AM
Recipe Request - Mail and Guardian ZA News zulusailor Recipes 1 06-17-2011 07:42 AM
Request for Recipe - Dallas Morning News erichoch Recipes 0 01-23-2011 08:25 AM
New recipe request - BBC News Ukrainian storkozos Introduce Yourself 7 10-25-2010 11:36 AM


All times are GMT -4. The time now is 01:33 AM.


MobileRead.com is a privately owned, operated and funded community.