|
|
#1 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 646
Karma: 85520
Join Date: May 2021
Device: kindle
|
India Seminar Magazine
This is a really old interface website but with really good content.
https://www.india-seminar.com/ This link - https://www.india-seminar.com/semframe.html will take you directly to the present issue the article links contain only text [example] for which auto_cleanup will work. All i want is for the recipe to parse article links to feed. Code:
from calibre.web.feeds.news import BasicNewsRecipe, classes
class Seminar(BasicNewsRecipe):
title = 'India-Seminar Magazine'
__author__ = 'unkn0wn'
description = 'Seminar - attempts a departure from the usual journal. Problems, national and international, are posed and discussed. Each issue deals with a single problem. Those who hold different and at times opposing viewpoints express their thoughts'
language = 'en_GB'
encoding = 'utf-8'
use_embedded_content = False
no_stylesheets = True
remove_javascript = True
masthead_url = 'https://www.india-seminar.com/semlogo/semlogo_top_1.jpg'
ignore_duplicate_articles = {'url'}
auto_cleanup = True
def parse_index(self):
soup = self.index_to_soup('https://www.india-seminar.com/semframe.html')
ans = []
for a in soup.findAll('a', href=lambda x: x and x.startswith(''751/'')):
url = a['href']
title = self.tag_to_string(a)
self.log(title, ' at ', url)
ans.append({'title': title, 'url': url})
return [('Articles', ans)]
# 751/ would need to be changed to 752/ in the next month.
<frame src="2022/751.htm" target="_self" name="action"> #document html body div.WordSection1 p.MsoNormal b span a I've tried so many things, nothing worked. I thought this would be really easy. It wont even give me an error.. just empty epub. Last edited by unkn0wn; 03-18-2022 at 02:26 PM. |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,718
Karma: 28549306
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You dont load https://www.india-seminar.com/semframe.html instead yo load the inner document 2022/751.htm and parse that.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 646
Karma: 85520
Join Date: May 2021
Device: kindle
|
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from datetime import datetime
class Seminar(BasicNewsRecipe):
title = 'India-Seminar Magazine'
__author__ = 'unkn0wn'
description = 'Seminar - attempts a departure from the usual journal. Problems, national and international, are posed and discussed. Each issue deals with a single problem. Those who hold different and at times opposing viewpoints express their thoughts'
language = 'en_GB'
use_embedded_content = False
remove_javascript = True
masthead_url = 'https://www.india-seminar.com/semlogo/semlogo_top_1.jpg'
ignore_duplicate_articles = {'url'}
def parse_index(self):
d = datetime.today()
soup = self.index_to_soup('https://www.india-seminar.com/' + d.strftime('%Y') + '/751.htm')
ans = []
for a in soup.findAll('a', href=lambda x: x):
url = a['href']
if url.startswith('751/'):
url = 'https://www.india-seminar.com/' + d.strftime('%Y') + '/' + url
title = self.tag_to_string(a)
self.log(title, ' at ', url)
ans.append({'title': title, 'url': url})
return [('Articles', ans)]
like %m (is month number) and add it to 748 so that jan will be 748+1 dec will become 748+12 |
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,718
Karma: 28549306
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You shouldnt need to calculate the month, just get it from the href in the page containing the frame.
|
|
|
|
|
|
#5 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 646
Karma: 85520
Join Date: May 2021
Device: kindle
|
oh.. Thanks.
It works good now. |
|
|
|
| Advert | |
|
|
|
|
#6 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,718
Karma: 28549306
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| India Legal Magazine | unkn0wn | Recipes | 2 | 03-13-2022 04:04 AM |
| Open Magazine India | unkn0wn | Recipes | 5 | 03-08-2022 02:06 AM |
| Caravan Magazine India | abhix3 | Recipes | 8 | 07-01-2020 06:54 AM |
| Frontline Magazine India | Yash912 | Recipes | 0 | 01-06-2014 05:07 AM |
| Caravan Magazine India | Yash912 | Recipes | 0 | 09-08-2013 10:39 AM |