I had a first stab at the recipe. Few problems with this are:
1. Some articles are spread over multiple pages. How do i get text from multiple pages and merge them together?
2. The rss section has a fixed text for each feed e.g. "Amarujala News : A Hindi News Website covers Breaking India news samachar in hindi, News Headlines in hindi from every State of India, news on business, sports, bollywood, political and more only at Amarujala.com". How do i delete this?
any pointers will be appreciated.
Code:
from calibre.web.feeds.news import BasicNewsRecipe
class AmarUjala(BasicNewsRecipe):
feeds = [
(u'National News',
u'http://www.amarujala.com/rss/national-news.xml'),
(u'International news',
u'http://www.amarujala.com/rss/international-news.xml'),
(u'Sports news',
u'http://www.amarujala.com/rss/sports-news.xml'),
(u'Business News',
u'http://www.amarujala.com/rss/business-news.xml'),
(u'Technology News',
u'http://www.amarujala.com/rss/technology-news.xml'),
]
title = u'Amar Ujala'
masthead_url = 'http://epaper.amarujala.com/images/header_logo.gif'
auto_cleanup = True
oldest_article = 2.0 # days
use_embedded_content = False
language = 'hi_IN'
publication_type = 'newspaper'
remove_empty_feeds = True
no_stylesheets = True
auto_cleanup = True