View Single Post
Old 04-04-2022, 03:56 AM   #5
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 631
Karma: 85520
Join Date: May 2021
Device: kindle
more cover urls for other recipes

India today update https://github.com/kovidgoyal/calibr...a_today.recipe

Code:
extra_css = '[itemprop^="description"] {font-size: small; font-style: italic;}'
    
    def get_cover_url(self):
        soup = self.index_to_soup('https://www.magzter.com/IN/India-Today-Group/India-Today/News/')
        for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
            return citem['content']
we cant get this cover from default website


THE WEEK
India

https://github.com/kovidgoyal/calibr...he_week.recipe

Cover url and other updates..

Code:
def get_cover_url(self):
        soup = self.index_to_soup('https://www.magzter.com/IN/Malayala_Manorama/THE_WEEK/Business/')
        for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
            return citem['content']
the quality of the cover url within the present recipe is very low.

remove all from line 36-57(end) ( present recipe won't load images within text of the article) (images are within src tag)
add below

Code:
keep_only_tags = [
        dict(name='h1'),
		dict(name='div', attrs={'class':['article-title','article-image','articlecontentbody section']}),
        ]
        
    remove_tags = [
        dict(name='div', attrs={'class':'highlights section'}),
        ]
Financial Express

cover url
Code:
def get_cover_url(self):
        soup = self.index_to_soup('https://www.magzter.com/IN/The-Indian-Express-Ltd./Financial-Express-Mumbai/Business/')
        for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
            return citem['content']
unkn0wn is offline   Reply With Quote