Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-17-2022, 11:29 AM   #1
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
Update to Live Mint recipe.

I mixed and matched most things from your other recipes.

Don't even know why I used somethings here like
for script in soup.findAll('script'):
script.extract()


But recipe works great.

Code:
from calibre.web.feeds.news import BasicNewsRecipe, classes


class LiveMint(BasicNewsRecipe):
    title = u'Live Mint - test'
    language = 'en_IN'
    __author__ = 'Krittika Goyal'
    oldest_article = 1  # days
    max_articles_per_feed = 50
    encoding = 'utf-8'
    use_embedded_content = False
    remove_attributes = ['style', 'height', 'width']
	
    keep_only_tags = [
		dict(name='h1'),
        dict(name='picture'),
		dict(name='figcaption'),
		classes('articleInfo FirstEle summary highlights paywall'),
	]
    remove_tags = [
        classes('trendingSimilarHeight moreNews mobAppDownload label msgError msgOk')
    ]
	
    feeds = [
        ('Companies','https://www.livemint.com/rss/companies'),
        ('Opinion','https://www.livemint.com/rss/opinion'),
        ('Money','https://www.livemint.com/rss/money'),
        ('Economy','https://www.livemint.com/rss/economy/'),
        ('Politics','https://www.livemint.com/rss/politics'),
        ('Science','https://www.livemint.com/rss/science'),
        ('Industry','https://www.livemint.com/rss/industry'),
        ('Lounge','https://www.livemint.com/rss/lounge'),
        ('Education','https://www.livemint.com/rss/education'),
        ('Sports','https://www.livemint.com/rss/sports'),
        ('Technology','https://www.livemint.com/rss/technology'),
        ('News','https://www.livemint.com/rss/news'),
        ('Mutual Funds','https://www.livemint.com/rss/Mutual Funds'),
        ('Markets','https://www.livemint.com/rss/markets'),
        ('AI','https://www.livemint.com/rss/AI'),
        ('Insurance','https://www.livemint.com/rss/insurance'),
        ('Budget','https://www.livemint.com/rss/budget'),
        ('Elections','https://www.livemint.com/rss/elections'),
    ]
	
    def preprocess_raw_html(self, raw_html, url):
        from calibre.ebooks.BeautifulSoup import BeautifulSoup
        soup = BeautifulSoup(raw_html)
        for script in soup.findAll('script'):
            script.extract()
        for style in soup.findAll('style'):
            style.extract()
        for img in soup.findAll('img', attrs={'data-src': True}):
            img['src'] = img['data-src']
        return str(soup)
		
calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36'
Attached Files
File Type: recipe Live Mint - test.recipe (2.3 KB, 149 views)
unkn0wn is offline   Reply With Quote
Old 02-17-2022, 12:19 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://github.com/kovidgoyal/calibr...8986cad2398efe
kovidgoyal is offline   Reply With Quote
Old 02-19-2022, 12:51 AM   #3
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
I found that Saturday edition of Live Mint is called Lounge which has feeds from a whole another website.

so I created a new recipe for this weekly edition


Code:
#!/usr/bin/env  python

from calibre.web.feeds.news import BasicNewsRecipe, classes


class Lounge(BasicNewsRecipe):
    title = u'Live Mint-Lounge'
    language = 'en_IN'
    __author__ = 'unkn0wn'
    oldest_article = 7  # days
    max_articles_per_feed = 50
    encoding = 'utf-8'
    use_embedded_content = False
    no_stylesheets = True
    remove_attributes = ['style', 'height', 'width']

    keep_only_tags = [
        dict(name='h1'),
		dict(name='h2', attrs={'id':'story-summary-0'}),
        dict(name='picture'),
		dict(name='div', attrs={'class':'innerBanCaption'}),
		dict(name='div', attrs={'id':'date-display-before-content'}),
        dict(name='div', attrs={'class':'storyContent'}),
    ]
    remove_tags = [
        classes(
            'sidebarAdv similarStoriesClass moreFromSecClass'
        )
    ]

    feeds = [
        ('News', 'https://lifestyle.livemint.com/rss/news'),
        ('Food','https://lifestyle.livemint.com/rss/food'),
		('Fashion','https://lifestyle.livemint.com/rss/fashion'),
		('How to Lounge','https://lifestyle.livemint.com/rss/how-to-lounge'),
		('Smart Living','https://lifestyle.livemint.com/rss/smart-living'),
    ]

    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs={'data-src': True}):
            img['src'] = img['data-src']
        for img in soup.findAll('img', attrs={'data-img': True}):
            img['src'] = img['data-img']
        return soup


calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36'
unkn0wn is offline   Reply With Quote
Old 02-19-2022, 12:54 AM   #4
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
Quote:
Originally Posted by unkn0wn View Post
I mixed and matched most things from your other recipes.

Don't even know why I used somethings here like
for script in soup.findAll('script'):
script.extract()


But recipe works great.

Code:
from calibre.web.feeds.news import BasicNewsRecipe, classes


class LiveMint(BasicNewsRecipe):
    title = u'Live Mint - test'
    language = 'en_IN'
    __author__ = 'Krittika Goyal'
    oldest_article = 1  # days
    max_articles_per_feed = 50
    encoding = 'utf-8'
    use_embedded_content = False
    remove_attributes = ['style', 'height', 'width']
	
    keep_only_tags = [
		dict(name='h1'),
        dict(name='picture'),
		dict(name='figcaption'),
		classes('articleInfo FirstEle summary highlights paywall'),
	]
    remove_tags = [
        classes('trendingSimilarHeight moreNews mobAppDownload label msgError msgOk')
    ]
	
    feeds = [
        ('Companies','https://www.livemint.com/rss/companies'),
        ('Opinion','https://www.livemint.com/rss/opinion'),
        ('Money','https://www.livemint.com/rss/money'),
        ('Economy','https://www.livemint.com/rss/economy/'),
        ('Politics','https://www.livemint.com/rss/politics'),
        ('Science','https://www.livemint.com/rss/science'),
        ('Industry','https://www.livemint.com/rss/industry'),
        ('Lounge','https://www.livemint.com/rss/lounge'),
        ('Education','https://www.livemint.com/rss/education'),
        ('Sports','https://www.livemint.com/rss/sports'),
        ('Technology','https://www.livemint.com/rss/technology'),
        ('News','https://www.livemint.com/rss/news'),
        ('Mutual Funds','https://www.livemint.com/rss/Mutual Funds'),
        ('Markets','https://www.livemint.com/rss/markets'),
        ('AI','https://www.livemint.com/rss/AI'),
        ('Insurance','https://www.livemint.com/rss/insurance'),
        ('Budget','https://www.livemint.com/rss/budget'),
        ('Elections','https://www.livemint.com/rss/elections'),
    ]
	
    def preprocess_raw_html(self, raw_html, url):
        from calibre.ebooks.BeautifulSoup import BeautifulSoup
        soup = BeautifulSoup(raw_html)
        for script in soup.findAll('script'):
            script.extract()
        for style in soup.findAll('style'):
            style.extract()
        for img in soup.findAll('img', attrs={'data-src': True}):
            img['src'] = img['data-src']
        return str(soup)
		
calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36'
Lounge feed from here is always empty.
unkn0wn is offline   Reply With Quote
Old 02-19-2022, 02:30 AM   #5
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
Quote:
Originally Posted by unkn0wn View Post
I found that Saturday edition of Live Mint is called Lounge which has feeds from a whole another website.

so I created a new recipe for this weekly edition


Code:
#!/usr/bin/env  python

from calibre.web.feeds.news import BasicNewsRecipe, classes


class Lounge(BasicNewsRecipe):
    title = u'Live Mint-Lounge'
    language = 'en_IN'
    __author__ = 'unkn0wn'
    oldest_article = 7  # days
    max_articles_per_feed = 50
    encoding = 'utf-8'
    use_embedded_content = False
    no_stylesheets = True
    remove_attributes = ['style', 'height', 'width']

    keep_only_tags = [
        dict(name='h1'),
		dict(name='h2', attrs={'id':'story-summary-0'}),
        dict(name='picture'),
		dict(name='div', attrs={'class':'innerBanCaption'}),
		dict(name='div', attrs={'id':'date-display-before-content'}),
        dict(name='div', attrs={'class':'storyContent'}),
    ]
    remove_tags = [
        classes(
            'sidebarAdv similarStoriesClass moreFromSecClass'
        )
    ]

    feeds = [
        ('News', 'https://lifestyle.livemint.com/rss/news'),
        ('Food','https://lifestyle.livemint.com/rss/food'),
		('Fashion','https://lifestyle.livemint.com/rss/fashion'),
		('How to Lounge','https://lifestyle.livemint.com/rss/how-to-lounge'),
		('Smart Living','https://lifestyle.livemint.com/rss/smart-living'),
    ]

    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs={'data-src': True}):
            img['src'] = img['data-src']
        for img in soup.findAll('img', attrs={'data-img': True}):
            img['src'] = img['data-img']
        return soup


calibre_most_common_ua = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.87 Safari/537.36'
LiveMint-Lounge.recipe
unkn0wn is offline   Reply With Quote
Old 02-20-2022, 02:50 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://github.com/kovidgoyal/calibr...1c2a2d0c717e67
kovidgoyal is offline   Reply With Quote
Old 04-04-2022, 08:39 AM   #7
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
HELP

I wanted to create cover_url for mint e-paper but it should be take after javascript is loaded, idk how to ask recipe to load js and then search for the jpg.

def get_cover_url(self):
soup = self.index_to_soup('https://epaper.livemint.com/Home/ArticleView')

=lambda x: x and x.endswith('01_mr.jpg')):

https://epsfs.hindustantimes.com/MIN...3b35_01_mr.jpg

the link that ends with 01_mr.jpg is the front page (not the top page with ads but actual front page) it can be found in the default page html itself, even though you don't navigate to it.

masthead_url = 'https://images.livemint.com/static/livemint-logo-v2.svg'

Last edited by unkn0wn; 04-04-2022 at 09:41 AM.
unkn0wn is offline   Reply With Quote
Old 04-04-2022, 09:42 AM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,596
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
recipes dont support javascript.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Calibre 4.20 doesn't start after update - Linux Mint 17.3 leelo12345 Devices 7 10-19-2019 01:28 PM
Live CSS Not Working after Update retiredbiker Editor 11 06-01-2018 01:39 AM
Install/Update fails on Linux Mint 17.1 Zauberlehrling! Devices 2 02-16-2015 12:18 PM
iOS 5.0.1 Update now live kjk Apple Devices 15 11-21-2011 09:18 PM
Major Update Live! What do you think? borisb enTourage Archive 72 06-06-2010 05:32 PM


All times are GMT -4. The time now is 09:10 PM.


MobileRead.com is a privately owned, operated and funded community.