![]() |
#1 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
India Seminar Magazine
This is a really old interface website but with really good content.
https://www.india-seminar.com/ This link - https://www.india-seminar.com/semframe.html will take you directly to the present issue the article links contain only text [example] for which auto_cleanup will work. All i want is for the recipe to parse article links to feed. Code:
from calibre.web.feeds.news import BasicNewsRecipe, classes class Seminar(BasicNewsRecipe): title = 'India-Seminar Magazine' __author__ = 'unkn0wn' description = 'Seminar - attempts a departure from the usual journal. Problems, national and international, are posed and discussed. Each issue deals with a single problem. Those who hold different and at times opposing viewpoints express their thoughts' language = 'en_GB' encoding = 'utf-8' use_embedded_content = False no_stylesheets = True remove_javascript = True masthead_url = 'https://www.india-seminar.com/semlogo/semlogo_top_1.jpg' ignore_duplicate_articles = {'url'} auto_cleanup = True def parse_index(self): soup = self.index_to_soup('https://www.india-seminar.com/semframe.html') ans = [] for a in soup.findAll('a', href=lambda x: x and x.startswith(''751/'')): url = a['href'] title = self.tag_to_string(a) self.log(title, ' at ', url) ans.append({'title': title, 'url': url}) return [('Articles', ans)] # 751/ would need to be changed to 752/ in the next month. <frame src="2022/751.htm" target="_self" name="action"> #document html body div.WordSection1 p.MsoNormal b span a I've tried so many things, nothing worked. I thought this would be really easy. It wont even give me an error.. just empty epub. Last edited by unkn0wn; 03-18-2022 at 01:26 PM. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,330
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You dont load https://www.india-seminar.com/semframe.html instead yo load the inner document 2022/751.htm and parse that.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
Code:
from calibre.web.feeds.news import BasicNewsRecipe from datetime import datetime class Seminar(BasicNewsRecipe): title = 'India-Seminar Magazine' __author__ = 'unkn0wn' description = 'Seminar - attempts a departure from the usual journal. Problems, national and international, are posed and discussed. Each issue deals with a single problem. Those who hold different and at times opposing viewpoints express their thoughts' language = 'en_GB' use_embedded_content = False remove_javascript = True masthead_url = 'https://www.india-seminar.com/semlogo/semlogo_top_1.jpg' ignore_duplicate_articles = {'url'} def parse_index(self): d = datetime.today() soup = self.index_to_soup('https://www.india-seminar.com/' + d.strftime('%Y') + '/751.htm') ans = [] for a in soup.findAll('a', href=lambda x: x): url = a['href'] if url.startswith('751/'): url = 'https://www.india-seminar.com/' + d.strftime('%Y') + '/' + url title = self.tag_to_string(a) self.log(title, ' at ', url) ans.append({'title': title, 'url': url}) return [('Articles', ans)] like %m (is month number) and add it to 748 so that jan will be 748+1 dec will become 748+12 |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,330
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You shouldnt need to calculate the month, just get it from the href in the page containing the frame.
|
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
oh.. Thanks.
It works good now. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,330
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
India Legal Magazine | unkn0wn | Recipes | 2 | 03-13-2022 03:04 AM |
Open Magazine India | unkn0wn | Recipes | 5 | 03-08-2022 01:06 AM |
Caravan Magazine India | abhix3 | Recipes | 8 | 07-01-2020 05:54 AM |
Frontline Magazine India | Yash912 | Recipes | 0 | 01-06-2014 04:07 AM |
Caravan Magazine India | Yash912 | Recipes | 0 | 09-08-2013 09:39 AM |