I created a recipe for scroll.in.
It is working, in the sense that it is creating a proper magazing which is readable on Kindle.
However, in the recipe, I have put 20 sections. I get a permanent redirect error in 10 of those sections. The following error message:
Code:
URL: https://scroll.in/food
<httperror_seek_wrapper (urllib.error.HTTPError instance) at 0x6208928 whose wrapped object = <HTTPError 308: 'PERMANENT REDIRECT'>>
But when I run the exact same code using requests library in a regular Python script, it is able to fetch all sections. Can anyone help me resolve this?
Here's the full code:
Code:
#!/usr/bin/env python
# vim:fileencoding=utf-8
from calibre.web.feeds.news import BasicNewsRecipe
class AdvancedUserRecipe1602773140(BasicNewsRecipe):
title = 'Scroll.in'
oldest_article = 2
max_articles_per_feed = 20
auto_cleanup = True
compress_news_images = True
compress_news_images_auto_size = 24
def parse_index(self):
category_list = [
('coronavirus-crisis', 'https://scroll.in/topic/56256/coronavirus-crisis'),
('food', 'https://scroll.in/food'),
('latest', 'https://scroll.in/latest'),
('reel', 'https://scroll.in/reel'),
('field', 'https://scroll.in/field'),
('magazine', 'https://scroll.in/magazine'),
('politics', 'https://scroll.in/category/76/politics'),
('culture', 'https://scroll.in/category/107/culture'),
('india', 'https://scroll.in/category/105/india'),
('world', 'https://scroll.in/category/3554/world'),
('film-and-tv', 'https://scroll.in/category/3/film-and-tv'),
('music', 'https://scroll.in/category/4/music'),
('books-and-ideas', 'https://scroll.in/category/80/books-and-ideas'),
('business-and-economy', 'https://scroll.in/category/77/business-and-economy'),
('science-and-technology', 'https://scroll.in/category/83/science-and-technology'),
('roving', 'https://scroll.in/roving'),
('global', 'https://scroll.in/global'),
('announcements', 'https://scroll.in/announcements'),
('pulse', 'https://scroll.in/pulse'),
('theplus', 'https://scroll.in/theplus')
]
br = self.get_browser()
feeds = []
for category in category_list:
print('URL: ', category[1])
try:
page = br.open(category[1])
html = page.read()
except Exception as e:
print(repr(e))
continue
soup = BeautifulSoup(html)
stories = soup.find_all('div', class_='row-story-meta')
articles = []
for story in stories:
article = story.find_parent()
author = story.find('address')
author = author.text if author is not None else 'Scroll.in'
article_dict = {'url': article['href'],
'title': story.find('h1').text,
'date': story.find('time')['datetime'],
'author': author}
articles.append(article_dict)
feeds.append((category[0], articles))
return feeds