Popular Science recipe is broken.

NSILMike · 11-15-2017, 03:20 PM

It just downloads titles and links:

Five rad and random things I found this week

The end-of-week dispatch from PopSci's commerce editor. Vol. 27.

By Billy Cadden posted Oct 27th, 2017 at 12:15pm
This article was downloaded by calibre from https://www.popsci.com/rad-and-rando...efault&src=syn

lui1 · 12-28-2017, 05:37 AM

Hello There,
I updated the recipe so that it finds the body of the article again. Apparently, they changed the CSS class name for the div containing the main text. Anyways, here's the recipe:

Popular Science

Code:

from calibre.web.feeds.news import BasicNewsRecipe
import re

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'Popular Science'
    language = 'en'
    __author__ = 'Kovid Goyal'
    description = 'Popular Science'
    publisher = 'Popular Science'
    oldest_article = 7  # change this if you want more current articles. I like to go a week in
    max_articles_per_feed = 100
    no_stylesheets = True
    remove_javascript = True
    use_embedded_content = False
    remove_empty_feeds = True
    ignore_duplicate_articles = {'url'}

    feeds = [

        ('Gadgets', 'http://www.popsci.com/full-feed/gadgets'),
        ('Cars', 'http://www.popsci.com/full-feed/cars'),
        ('Science', 'http://www.popsci.com/full-feed/science'),
        ('Technology', 'http://www.popsci.com/full-feed/technology'),
        ('DIY', 'http://www.popsci.com/full-feed/diy'),
        ('Animals', 'https://www.popsci.com/rss-animals.xml'),
        ('Space', 'https://www.popsci.com/rss-space.xml'),
        ('Environment', 'https://www.popsci.com/rss-environment.xml'),
        ('Eastern Arsenal', 'https://www.popsci.com/rss-eastern-arsenal.xml'),

    ]
    
    pane_node_body = re.compile('pane-node-(?:\w+-){0,9}body')
    
    keep_only_tags = [
        dict(attrs={'class': lambda x: x and frozenset('pane-node-header'.split()).issubset(frozenset(x.split())) }),
        dict(attrs={'class': pane_node_body}),
    ]

    remove_tags = [
        dict(attrs={'class': lambda x: x and frozenset('ads seperator'.split()).issubset(frozenset(x.split())) }),
    ]

    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs={'data-medsrc': True}):
            img['src'] = img['data-medsrc']
        return soup

NSILMike · 12-28-2017, 10:27 AM

Thanks! Looks quite good now!
Happy new year.

11-15-2017, 03:20 PM	#1
NSILMike Guru Posts: 735 Karma: 35936 Join Date: Apr 2011 Location: Shrewsury, MA Device: Lenovo Android Tablet	Popular Science recipe is broken. It just downloads titles and links: Five rad and random things I found this week The end-of-week dispatch from PopSci's commerce editor. Vol. 27. By Billy Cadden posted Oct 27th, 2017 at 12:15pm This article was downloaded by calibre from https://www.popsci.com/rad-and-rando...efault&src=syn

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Popular Science recipe needs help	NSILMike	Recipes	2	05-27-2015 09:06 AM
Good Popular Science Books?	bonaldo2000	Reading Recommendations	8	12-22-2011 08:27 AM
Recipe - Popular Science (Australian Ed)	lmcbean	Recipes	0	05-01-2011 06:47 PM
Popular Science	mhuntoon	Calibre	2	03-08-2010 01:23 PM
Popular Science and Calibre	rcuadro	Calibre	1	10-26-2009 11:57 AM

12-28-2017, 10:27 AM	#3
NSILMike Guru Posts: 735 Karma: 35936 Join Date: Apr 2011 Location: Shrewsury, MA Device: Lenovo Android Tablet	Thanks! Looks quite good now! Happy new year.

Advert