View Single Post
Old 03-18-2022, 11:17 AM   #1
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 627
Karma: 85520
Join Date: May 2021
Device: kindle
India Seminar Magazine

This is a really old interface website but with really good content.

https://www.india-seminar.com/

This link - https://www.india-seminar.com/semframe.html will take you directly to the present issue

the article links contain only text [example] for which auto_cleanup will work.

All i want is for the recipe to parse article links to feed.

Code:
from calibre.web.feeds.news import BasicNewsRecipe, classes
        
class Seminar(BasicNewsRecipe):
    title = 'India-Seminar Magazine'
    __author__ = 'unkn0wn'
    description = 'Seminar - attempts a departure from the usual journal. Problems, national and international, are posed and discussed. Each issue deals with a single problem. Those who hold different and at times opposing viewpoints express their thoughts'
    language = 'en_GB'
    encoding = 'utf-8'
    use_embedded_content = False
    no_stylesheets = True
    remove_javascript = True
    masthead_url = 'https://www.india-seminar.com/semlogo/semlogo_top_1.jpg'
    ignore_duplicate_articles = {'url'}
    
    auto_cleanup = True
        
    def parse_index(self):
        soup = self.index_to_soup('https://www.india-seminar.com/semframe.html')
        ans = []

        for a in soup.findAll('a', href=lambda x: x and x.startswith(''751/'')):
            url = a['href']
            title = self.tag_to_string(a)
            self.log(title, ' at ', url)
            ans.append({'title': title, 'url': url})
        return [('Articles', ans)]
# 751/ would need to be changed to 752/ in the next month.
hrefs' can be found here

<frame src="2022/751.htm" target="_self" name="action">
#document
html body div.WordSection1 p.MsoNormal b span a

I've tried so many things, nothing worked. I thought this would be really easy.

It wont even give me an error.. just empty epub.

Last edited by unkn0wn; 03-18-2022 at 01:26 PM.
unkn0wn is offline   Reply With Quote