![]() |
#1 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
Update to Live Mint recipe.
I mixed and matched most things from your other recipes.
Don't even know why I used somethings here like for script in soup.findAll('script'): script.extract() But recipe works great. Code:
from calibre.web.feeds.news import BasicNewsRecipe, classes class LiveMint(BasicNewsRecipe): title = u'Live Mint - test' language = 'en_IN' __author__ = 'Krittika Goyal' oldest_article = 1 # days max_articles_per_feed = 50 encoding = 'utf-8' use_embedded_content = False remove_attributes = ['style', 'height', 'width'] keep_only_tags = [ dict(name='h1'), dict(name='picture'), dict(name='figcaption'), classes('articleInfo FirstEle summary highlights paywall'), ] remove_tags = [ classes('trendingSimilarHeight moreNews mobAppDownload label msgError msgOk') ] feeds = [ ('Companies','https://www.livemint.com/rss/companies'), ('Opinion','https://www.livemint.com/rss/opinion'), ('Money','https://www.livemint.com/rss/money'), ('Economy','https://www.livemint.com/rss/economy/'), ('Politics','https://www.livemint.com/rss/politics'), ('Science','https://www.livemint.com/rss/science'), ('Industry','https://www.livemint.com/rss/industry'), ('Lounge','https://www.livemint.com/rss/lounge'), ('Education','https://www.livemint.com/rss/education'), ('Sports','https://www.livemint.com/rss/sports'), ('Technology','https://www.livemint.com/rss/technology'), ('News','https://www.livemint.com/rss/news'), ('Mutual Funds','https://www.livemint.com/rss/Mutual Funds'), ('Markets','https://www.livemint.com/rss/markets'), ('AI','https://www.livemint.com/rss/AI'), ('Insurance','https://www.livemint.com/rss/insurance'), ('Budget','https://www.livemint.com/rss/budget'), ('Elections','https://www.livemint.com/rss/elections'), ] def preprocess_raw_html(self, raw_html, url): from calibre.ebooks.BeautifulSoup import BeautifulSoup soup = BeautifulSoup(raw_html) for script in soup.findAll('script'): script.extract() for style in soup.findAll('style'): style.extract() for img in soup.findAll('img', attrs={'data-src': True}): img['src'] = img['data-src'] return str(soup) calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36' |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
I found that Saturday edition of Live Mint is called Lounge which has feeds from a whole another website.
so I created a new recipe for this weekly edition Code:
#!/usr/bin/env python from calibre.web.feeds.news import BasicNewsRecipe, classes class Lounge(BasicNewsRecipe): title = u'Live Mint-Lounge' language = 'en_IN' __author__ = 'unkn0wn' oldest_article = 7 # days max_articles_per_feed = 50 encoding = 'utf-8' use_embedded_content = False no_stylesheets = True remove_attributes = ['style', 'height', 'width'] keep_only_tags = [ dict(name='h1'), dict(name='h2', attrs={'id':'story-summary-0'}), dict(name='picture'), dict(name='div', attrs={'class':'innerBanCaption'}), dict(name='div', attrs={'id':'date-display-before-content'}), dict(name='div', attrs={'class':'storyContent'}), ] remove_tags = [ classes( 'sidebarAdv similarStoriesClass moreFromSecClass' ) ] feeds = [ ('News', 'https://lifestyle.livemint.com/rss/news'), ('Food','https://lifestyle.livemint.com/rss/food'), ('Fashion','https://lifestyle.livemint.com/rss/fashion'), ('How to Lounge','https://lifestyle.livemint.com/rss/how-to-lounge'), ('Smart Living','https://lifestyle.livemint.com/rss/smart-living'), ] def preprocess_html(self, soup): for img in soup.findAll('img', attrs={'data-src': True}): img['src'] = img['data-src'] for img in soup.findAll('img', attrs={'data-img': True}): img['src'] = img['data-img'] return soup calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36' |
![]() |
![]() |
![]() |
#4 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
Quote:
|
|
![]() |
![]() |
![]() |
#5 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
Quote:
|
|
![]() |
![]() |
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
![]() |
#7 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
HELP
I wanted to create cover_url for mint e-paper but it should be take after javascript is loaded, idk how to ask recipe to load js and then search for the jpg.
def get_cover_url(self): soup = self.index_to_soup('https://epaper.livemint.com/Home/ArticleView') =lambda x: x and x.endswith('01_mr.jpg')): https://epsfs.hindustantimes.com/MIN...3b35_01_mr.jpg the link that ends with 01_mr.jpg is the front page (not the top page with ads but actual front page) it can be found in the default page html itself, even though you don't navigate to it. masthead_url = 'https://images.livemint.com/static/livemint-logo-v2.svg' Last edited by unkn0wn; 04-04-2022 at 09:41 AM. |
![]() |
![]() |
![]() |
#8 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
recipes dont support javascript.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Calibre 4.20 doesn't start after update - Linux Mint 17.3 | leelo12345 | Devices | 7 | 10-19-2019 01:28 PM |
Live CSS Not Working after Update | retiredbiker | Editor | 11 | 06-01-2018 01:39 AM |
Install/Update fails on Linux Mint 17.1 | Zauberlehrling! | Devices | 2 | 02-16-2015 12:18 PM |
iOS 5.0.1 Update now live | kjk | Apple Devices | 15 | 11-21-2011 09:18 PM |
Major Update Live! What do you think? | borisb | enTourage Archive | 72 | 06-06-2010 05:32 PM |