View Single Post
Old 10-16-2020, 12:58 PM   #1
gourav
Member
gourav doesn't littergourav doesn't litter
 
Posts: 14
Karma: 132
Join Date: Aug 2014
Device: Kindle Paperwhite 7th Gen
Creating a recipe for Scroll.in

I created a recipe for scroll.in.

It is working, in the sense that it is creating a proper magazing which is readable on Kindle.

However, in the recipe, I have put 20 sections. I get a permanent redirect error in 10 of those sections. The following error message:
Code:
URL:  https://scroll.in/food
<httperror_seek_wrapper (urllib.error.HTTPError instance) at 0x6208928 whose wrapped object = <HTTPError 308: 'PERMANENT REDIRECT'>>
But when I run the exact same code using requests library in a regular Python script, it is able to fetch all sections. Can anyone help me resolve this?

Here's the full code:
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1602773140(BasicNewsRecipe):
    title          = 'Scroll.in'
    oldest_article = 2
    max_articles_per_feed = 20
    auto_cleanup   = True
    
    compress_news_images = True
    compress_news_images_auto_size = 24

    def parse_index(self):
        
        category_list = [
		    ('coronavirus-crisis', 'https://scroll.in/topic/56256/coronavirus-crisis'),
		    ('food', 'https://scroll.in/food'),
		    ('latest', 'https://scroll.in/latest'),
		    ('reel', 'https://scroll.in/reel'),
		    ('field', 'https://scroll.in/field'),
		    ('magazine', 'https://scroll.in/magazine'),
		    ('politics', 'https://scroll.in/category/76/politics'),
		    ('culture', 'https://scroll.in/category/107/culture'),
		    ('india', 'https://scroll.in/category/105/india'),
		    ('world', 'https://scroll.in/category/3554/world'),
		    ('film-and-tv', 'https://scroll.in/category/3/film-and-tv'),
		    ('music', 'https://scroll.in/category/4/music'),
		    ('books-and-ideas', 'https://scroll.in/category/80/books-and-ideas'),
		    ('business-and-economy', 'https://scroll.in/category/77/business-and-economy'),
		    ('science-and-technology', 'https://scroll.in/category/83/science-and-technology'),
		    ('roving', 'https://scroll.in/roving'),
            ('global', 'https://scroll.in/global'),
		    ('announcements', 'https://scroll.in/announcements'),
		    ('pulse', 'https://scroll.in/pulse'),
		    ('theplus', 'https://scroll.in/theplus')
		]

        br = self.get_browser()
        
        feeds = []

        for category in category_list:
            print('URL: ', category[1])
            try:
                page = br.open(category[1])
                html = page.read()
            except Exception as e:
                print(repr(e))
                continue
            
            soup = BeautifulSoup(html)

            stories = soup.find_all('div', class_='row-story-meta')

            articles = []

            for story in stories:
                article = story.find_parent()
                author = story.find('address')
                author = author.text if author is not None else 'Scroll.in'
                article_dict = {'url': article['href'],
                                'title': story.find('h1').text,
                                'date': story.find('time')['datetime'],
                                'author': author}
                articles.append(article_dict)

            feeds.append((category[0], articles))
            
        return feeds
gourav is offline   Reply With Quote