View Single Post
Old 09-22-2010, 02:27 PM   #2816
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Thetasquared View Post
I would like to request a recipe for the "Current Issue" of Science News.

The rss feed is:
http://www.sciencenews.org/view/feed...ame/issues.rss

I know that a Science News recipe exists, but simply switching the feeds does not return articles for the "current issue"
Here's a quick and dirty version. Why don't you look it over and spot what needs to get cleaned up better. Post here and I'll address it. I really like Science News.
Spoiler:
Code:
#!/usr/bin/env  python

__license__   = 'GPL v3'
'''
sciencenews.org
'''
from calibre.web.feeds.news import BasicNewsRecipe

class Sciencenews(BasicNewsRecipe):
    title                 = u'Science News Current Issues'
    __author__            = u'Starson17'
    description           = u"Science News is an award-winning weekly newsmagazine covering the most important research in all fields of science. Its 16 pages each week are packed with short, accurate articles that appeal to both general readers and scientists. Published since 1922, the magazine now reaches about 150,000 subscribers and more than 1 million readers. These are the latest News Items from Science News."
    oldest_article        = 30
    language = 'en'

    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False
    timefmt               = ' [%A, %d %B, %Y]'
    recursions = 1
    
    extra_css = '''
                .content_description{font-family:georgia ;font-size:x-large; color:#646464 ; font-weight:bold;}
                .content_summary{font-family:georgia ;font-size:small ;color:#585858 ; font-weight:bold;}
                .content_authors{font-family:helvetica,arial ;font-size: xx-small ;color:#14487E ;}
                .content_edition{font-family:helvetica,arial ;font-size: xx-small ;}
                .exclusive{color:#FF0000 ;}
                .anonymous{color:#14487E ;}
                .content_content{font-family:helvetica,arial ;font-size: x-small ; color:#000000;}
                .description{color:#585858;font-family:helvetica,arial ;font-size: xx-small ;}
                .credit{color:#A6A6A6;font-family:helvetica,arial ;font-size: xx-small ;}
                '''

    keep_only_tags = [ dict(name='div', attrs={'id':'column_action'}) ]
    remove_tags_after = dict(name='ul', attrs={'id':'content_functions_bottom'})
    remove_tags = [
                     dict(name='ul', attrs={'id':'content_functions_bottom'})
                    ,dict(name='div', attrs={'id':['content_functions_top','breadcrumb_content']})
                    ,dict(name='img', attrs={'class':'icon'})
                    ,dict(name='div', attrs={'class': 'embiggen'})
                  ]
    
    feeds       = [(u"Science News Current Issues", u'http://www.sciencenews.org/view/feed/type/edition/name/issues.rss')]

    match_regexps = [
            r'www.sciencenews.org/view/feature/id/',
            r'www.sciencenews.org/view/generic/id'
            ]
    #http://www.sciencenews.org/view/feature/id/63177/title/Fire_%2Bamp%3B_Ice
    
    def get_cover_url(self):
        cover_url = None
        index = 'http://www.sciencenews.org/view/home'
        soup = self.index_to_soup(index)
        link_item = soup.find(name = 'img',alt = "issue")
        print link_item
        if link_item:
           cover_url = 'http://www.sciencenews.org' + link_item['src'] + '.jpg'

        return cover_url

    def preprocess_html(self, soup):
        for tag in soup.findAll(name=['span']):
            tag.name = 'div'
        return soup
Starson17 is offline