This is a really old interface website but with really good content.
https://www.india-seminar.com/
This link -
https://www.india-seminar.com/semframe.html will take you directly to the present issue
the article links contain only text
[example] for which auto_cleanup will work.
All i want is for the recipe to parse article links to feed.
Code:
from calibre.web.feeds.news import BasicNewsRecipe, classes
class Seminar(BasicNewsRecipe):
title = 'India-Seminar Magazine'
__author__ = 'unkn0wn'
description = 'Seminar - attempts a departure from the usual journal. Problems, national and international, are posed and discussed. Each issue deals with a single problem. Those who hold different and at times opposing viewpoints express their thoughts'
language = 'en_GB'
encoding = 'utf-8'
use_embedded_content = False
no_stylesheets = True
remove_javascript = True
masthead_url = 'https://www.india-seminar.com/semlogo/semlogo_top_1.jpg'
ignore_duplicate_articles = {'url'}
auto_cleanup = True
def parse_index(self):
soup = self.index_to_soup('https://www.india-seminar.com/semframe.html')
ans = []
for a in soup.findAll('a', href=lambda x: x and x.startswith(''751/'')):
url = a['href']
title = self.tag_to_string(a)
self.log(title, ' at ', url)
ans.append({'title': title, 'url': url})
return [('Articles', ans)]
# 751/ would need to be changed to 752/ in the next month.
hrefs' can be found here
<frame src="2022/751.htm" target="_self" name="action">
#document
html body div.WordSection1 p.MsoNormal b span a
I've tried so many things, nothing worked. I thought this would be really easy.
It wont even give me an error.. just empty epub.