Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-08-2019, 09:47 AM   #1
amj
Junior Member
amj began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2019
Device: Kindle Paperwhite
News Fetch from Scientific American failing

This was a known issue. I did not know that at the time of reporting.

Sorry for the inconvenience. Please delete this thread.

Subscription to Scientific American is required to fetch it.

I guess the description to that subscription is optional should be corrected.

Thanks!

Last edited by amj; 01-08-2019 at 10:08 AM. Reason: Known issue
amj is offline   Reply With Quote
Old 01-08-2019, 08:14 PM   #2
lui1
Member
lui1 began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Dec 2017
Location: Los Angeles, CA
Device: Smart Phone
Update for the Scientific American

Actually, you can still download the first few sentences, so you can see what's in the articles and some of them do download completely. This code works for me if you want to try it.

Update to the Scientific American Recipe
Code:
#!/usr/bin/env  python2
__license__ = 'GPL v3'

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.utils.date import now
from css_selectors import Select


def absurl(url):
    if url.startswith('/'):
        url = 'http://www.scientificamerican.com' + url
    return url

keep_classes = {'article-header', 'article-content',
                'article-media', 'article-author', 'article-text'}
remove_classes = {'aside-banner', 'moreToExplore', 'article-footer'}


class ScientificAmerican(BasicNewsRecipe):
    title = u'Scientific American'
    description = u'Popular Science. Monthly magazine. Should be downloaded around the middle of each month.'
    category = 'science'
    __author__ = 'Kovid Goyal'
    no_stylesheets = True
    language = 'en'
    publisher = 'Nature Publishing Group'
    remove_empty_feeds = True
    remove_javascript = True
    timefmt = ' [%B %Y]'

    needs_subscription = 'optional'

    keep_only_tags = [
        dict(attrs={'class': lambda x: x and bool(
            set(x.split()).intersection(keep_classes))}),
    ]
    remove_tags = [
        dict(attrs={'class': lambda x: x and bool(
            set(x.split()).intersection(remove_classes))}),
        dict(id=['seeAlsoLinks']),
    ]

    def get_browser(self, *args):
        br = BasicNewsRecipe.get_browser(self)
        if self.username and self.password:
            br.open('https://www.scientificamerican.com/my-account/login/')
            br.select_form(predicate=lambda f: f.attrs.get('id') == 'login')
            br['emailAddress'] = self.username
            br['password'] = self.password
            br.submit()
        return br

    def parse_index(self):
        # Get the cover, date and issue URL
        root = self.index_to_soup(
            'http://www.scientificamerican.com/sciammag/', as_tree=True)
        select = Select(root)
        url = [x.get('content', '') for x in select('html > head  meta') if x.get('property',None) == "og:url"][0]
        self.cover_url = [x.get('src', '') for x in select('main .product-detail__image picture img')][0]

        # Now parse the actual issue to get the list of articles
        select = Select(self.index_to_soup(url, as_tree=True))
        feeds = []
        for i, section in enumerate(select('#sa_body .toc-articles')):
            if i == 0:
                feeds.append(
                    ('Features', list(self.parse_sciam_features(select, section))))
            else:
                feeds.extend(self.parse_sciam_departments(select, section))

        return feeds

    def parse_sciam_features(self, select, section):
        for article in select('article[data-article-title]', section):
            title = article.get('data-article-title')
            for a in select('a[href]', article):
                url = absurl(a.get('href'))
                break
            desc = ''
            for p in select('p.t_body', article):
                desc = self.tag_to_string(p)
                break
            self.log('Found feature article: %s at %s' % (title, url))
            self.log('\t' + desc)
            yield {'title': title, 'url': url, 'description': desc}

    def parse_sciam_departments(self, select, section):
        section_title, articles = 'Unknown', []
        for li in select('li[data-article-title]', section):
            for span in select('span.department-title', li):
                if articles:
                    yield section_title, articles
                section_title, articles = self.tag_to_string(span), []
                self.log('\nFound section: %s' % section_title)
                break
            for a in select('h2 a[href]', li):
                title = self.tag_to_string(a)
                url = absurl(a.get('href'))
                articles.append(
                    {'title': title, 'url': url, 'description': ''})
                self.log('\tFound article: %s at %s' % (title, url))
        if articles:
            yield section_title, articles
lui1 is offline   Reply With Quote
Old 01-09-2019, 05:07 AM   #3
amj
Junior Member
amj began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Jan 2019
Device: Kindle Paperwhite
Thanks a ton! Your code snippet works for me. Even the out of the box version is working now. Can't understand why this happened.
amj is offline   Reply With Quote
Reply

Tags
receipe, scientific american

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Scientific American broken? NSILMike Recipes 4 12-13-2014 12:30 PM
Slate fetch failing BillD Recipes 8 11-01-2011 09:59 AM
Fetch News failing (All strings must be XML compatible nuveen Recipes 11 10-01-2011 01:01 PM
Scientific American Starson17 Recipes 13 09-25-2010 04:37 PM
Scientific American recipe Stingo Calibre 2 10-30-2009 06:42 PM


All times are GMT -4. The time now is 05:57 PM.


MobileRead.com is a privately owned, operated and funded community.