MobileRead Forums - View Single Post

unkn0wn · 04-04-2022, 03:56 AM

India today update https://github.com/kovidgoyal/calibr...a_today.recipe

Code:

extra_css = '[itemprop^="description"] {font-size: small; font-style: italic;}'
    
    def get_cover_url(self):
        soup = self.index_to_soup('https://www.magzter.com/IN/India-Today-Group/India-Today/News/')
        for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
            return citem['content']

we cant get this cover from default website

THE WEEK India

https://github.com/kovidgoyal/calibr...he_week.recipe

Cover url and other updates..

Code:

def get_cover_url(self):
        soup = self.index_to_soup('https://www.magzter.com/IN/Malayala_Manorama/THE_WEEK/Business/')
        for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
            return citem['content']

the quality of the cover url within the present recipe is very low.

remove all from line 36-57(end) ( present recipe won't load images within text of the article) (images are within src tag)
add below

Code:

keep_only_tags = [
        dict(name='h1'),
		dict(name='div', attrs={'class':['article-title','article-image','articlecontentbody section']}),
        ]
        
    remove_tags = [
        dict(name='div', attrs={'class':'highlights section'}),
        ]

Financial Express

cover url

Code:

def get_cover_url(self):
        soup = self.index_to_soup('https://www.magzter.com/IN/The-Indian-Express-Ltd./Financial-Express-Mumbai/Business/')
        for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
            return citem['content']