02-23-2022, 05:17 AM | #1 |
Evangelist
Posts: 442
Karma: 82686
Join Date: May 2021
Device: kindle
|
Update Indian express
fixing some tags and removing unnecessary banners
https://github.com/kovidgoyal/calibr...00db42e010677d Code:
remove_attributes = ['style','height','width'] ignore_duplicate_articles = {'url'} keep_only_tags = [ classes('heading-part full-details') ] remove_tags = [ dict(name='nav', attrs={'class':'ie-breadcrumb'}), dict(name='div', attrs={'id':'ie_story_comments'}), dict(name='div', attrs={'class':['ie-int-campign-ad','custom_read_button','unitimg','copyright']}), dict(name='img', attrs={'src':'https://images.indianexpress.com/2021/06/explained-button-300-ie.jpeg'}), dict(name='a', attrs={'href':'https://indianexpress.com/section/explained/?utm_source=newbanner'}), dict(name='img', attrs={'src':'https://images.indianexpress.com/2021/06/opinion-button-300-ie.jpeg'}), dict(name='a', attrs={'href':'https://indianexpress.com/section/opinion/?utm_source=newbanner'}), classes('share-social appstext storytags pdsc-related-modify news-guard'), Last edited by unkn0wn; 02-23-2022 at 05:20 AM. |
02-23-2022, 09:41 AM | #2 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
Advert | |
|
04-04-2022, 02:15 AM | #3 | ||||
Evangelist
Posts: 442
Karma: 82686
Join Date: May 2021
Device: kindle
|
update
Quote:
Quote:
Code:
def get_cover_url(self): soup = self.index_to_soup('https://www.magzter.com/IN/The-Indian-Express-Ltd./The-Indian-Express-Mumbai/Newspaper/') for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')): return citem['content'] Quote:
more feeds Quote:
Last edited by unkn0wn; 04-04-2022 at 03:09 AM. |
||||
04-04-2022, 03:18 AM | #4 |
Evangelist
Posts: 442
Karma: 82686
Join Date: May 2021
Device: kindle
|
also cover url for hindustan times
https://github.com/kovidgoyal/calibr...n_times.recipe found that its much easier to get covers from magzter. Code:
def get_cover_url(self): soup = self.index_to_soup('https://www.magzter.com/IN/HT-Digital-Streams-Ltd./Hindustan-Times-Delhi/Newspaper/') for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')): return citem['content'] |
04-04-2022, 03:56 AM | #5 |
Evangelist
Posts: 442
Karma: 82686
Join Date: May 2021
Device: kindle
|
more cover urls for other recipes
India today update https://github.com/kovidgoyal/calibr...a_today.recipe
Code:
extra_css = '[itemprop^="description"] {font-size: small; font-style: italic;}' def get_cover_url(self): soup = self.index_to_soup('https://www.magzter.com/IN/India-Today-Group/India-Today/News/') for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')): return citem['content'] THE WEEK India https://github.com/kovidgoyal/calibr...he_week.recipe Cover url and other updates.. Code:
def get_cover_url(self): soup = self.index_to_soup('https://www.magzter.com/IN/Malayala_Manorama/THE_WEEK/Business/') for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')): return citem['content'] remove all from line 36-57(end) ( present recipe won't load images within text of the article) (images are within src tag) add below Code:
keep_only_tags = [ dict(name='h1'), dict(name='div', attrs={'class':['article-title','article-image','articlecontentbody section']}), ] remove_tags = [ dict(name='div', attrs={'class':'highlights section'}), ] cover url Code:
def get_cover_url(self): soup = self.index_to_soup('https://www.magzter.com/IN/The-Indian-Express-Ltd./Financial-Express-Mumbai/Business/') for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')): return citem['content'] |
Advert | |
|
04-06-2022, 11:09 AM | #6 | ||
Evangelist
Posts: 442
Karma: 82686
Join Date: May 2021
Device: kindle
|
Times of india
Quote:
Code:
def get_cover_url(self): soup = self.index_to_soup('https://www.magzter.com/IN/Bennett-Coleman-and-Company-Limited/The-Times-of-India-Delhi/Newspaper/') for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')): return citem['content'] LiveMint why not use same img everyday for livemint. Quote:
Last edited by unkn0wn; 04-06-2022 at 11:56 AM. |
||
04-07-2022, 06:07 AM | #7 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yes, the stats are up-to-date
|
04-07-2022, 01:03 PM | #8 |
Evangelist
Posts: 442
Karma: 82686
Join Date: May 2021
Device: kindle
|
Code:
def get_cover_url(self): soup = self.index_to_soup('https://www.magzter.com/IN/Bennett-Coleman-and-Company-Limited/The-Times-of-India-Delhi/Newspaper/') for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')): return citem['content'] its just that fetching the daily front page as cover makes it much more interesting.. sorry i kept asking you to make so many changes.. |
04-07-2022, 09:31 PM | #9 |
creator of calibre
Posts: 43,842
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you send pull requests on github it will be easier to ensure I dont miss anything.
|
04-30-2022, 12:54 AM | #10 |
Evangelist
Posts: 442
Karma: 82686
Join Date: May 2021
Device: kindle
|
removing new div in indian express
(this new div is adding too much unnecessary stuff to all the articles)
add premium-story to remove_tags classes new feed ('Political Pulse', 'https://indianexpress.com/section/india/political-pulse/feed/'), ('India', 'https://indianexpress.com/section/india/feed/'), Last edited by unkn0wn; 04-30-2022 at 12:56 AM. |
05-02-2022, 02:25 AM | #11 |
Evangelist
Posts: 442
Karma: 82686
Join Date: May 2021
Device: kindle
|
similar problem with financial express
https://github.com/kovidgoyal/calibr...e_india.recipe add remove_tags = [classes('parent_also_read')] |
05-02-2022, 06:01 PM | #12 |
Member
Posts: 21
Karma: 10
Join Date: Apr 2022
Device: android tablet
|
|
05-03-2022, 02:40 AM | #13 |
Evangelist
Posts: 442
Karma: 82686
Join Date: May 2021
Device: kindle
|
the changes were already made.. load from default IE recipe.
|
05-08-2022, 12:32 PM | #14 |
Member
Posts: 21
Karma: 10
Join Date: Apr 2022
Device: android tablet
|
|
05-09-2022, 12:56 AM | #15 |
Evangelist
Posts: 442
Karma: 82686
Join Date: May 2021
Device: kindle
|
no.. just use the default recipe
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Updated feeds for Indian Express | unkn0wn | Recipes | 2 | 01-27-2022 04:49 AM |
Indian Express misses some articles | nikstar007 | Recipes | 1 | 08-30-2016 08:10 AM |
daily express update | scissors | Recipes | 0 | 11-22-2014 03:18 AM |
New Musical Express update 9/6/12 | scissors | Recipes | 0 | 06-09-2012 07:53 AM |
Indian Express Recipe | sexymax15 | Recipes | 0 | 06-16-2011 06:06 AM |