|
|
#1 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
Update Indian express
fixing some tags and removing unnecessary banners
https://github.com/kovidgoyal/calibr...00db42e010677d Code:
remove_attributes = ['style','height','width']
ignore_duplicate_articles = {'url'}
keep_only_tags = [
classes('heading-part full-details')
]
remove_tags = [
dict(name='nav', attrs={'class':'ie-breadcrumb'}),
dict(name='div', attrs={'id':'ie_story_comments'}),
dict(name='div', attrs={'class':['ie-int-campign-ad','custom_read_button','unitimg','copyright']}),
dict(name='img', attrs={'src':'https://images.indianexpress.com/2021/06/explained-button-300-ie.jpeg'}),
dict(name='a', attrs={'href':'https://indianexpress.com/section/explained/?utm_source=newbanner'}),
dict(name='img', attrs={'src':'https://images.indianexpress.com/2021/06/opinion-button-300-ie.jpeg'}),
dict(name='a', attrs={'href':'https://indianexpress.com/section/opinion/?utm_source=newbanner'}),
classes('share-social appstext storytags pdsc-related-modify news-guard'),
Last edited by unkn0wn; 02-23-2022 at 05:20 AM. |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,597
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
|
|
|
| Advert | |
|
|
|
|
#3 | ||||
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
update
Quote:
Quote:
Code:
def get_cover_url(self):
soup = self.index_to_soup('https://www.magzter.com/IN/The-Indian-Express-Ltd./The-Indian-Express-Mumbai/Newspaper/')
for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
return citem['content']
Quote:
more feeds Quote:
Last edited by unkn0wn; 04-04-2022 at 03:09 AM. |
||||
|
|
|
|
|
#4 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
also cover url for hindustan times
https://github.com/kovidgoyal/calibr...n_times.recipe found that its much easier to get covers from magzter. Code:
def get_cover_url(self):
soup = self.index_to_soup('https://www.magzter.com/IN/HT-Digital-Streams-Ltd./Hindustan-Times-Delhi/Newspaper/')
for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
return citem['content']
|
|
|
|
|
|
#5 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
more cover urls for other recipes
India today update https://github.com/kovidgoyal/calibr...a_today.recipe
Code:
extra_css = '[itemprop^="description"] {font-size: small; font-style: italic;}'
def get_cover_url(self):
soup = self.index_to_soup('https://www.magzter.com/IN/India-Today-Group/India-Today/News/')
for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
return citem['content']
THE WEEK India https://github.com/kovidgoyal/calibr...he_week.recipe Cover url and other updates.. Code:
def get_cover_url(self):
soup = self.index_to_soup('https://www.magzter.com/IN/Malayala_Manorama/THE_WEEK/Business/')
for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
return citem['content']
remove all from line 36-57(end) ( present recipe won't load images within text of the article) (images are within src tag) add below Code:
keep_only_tags = [
dict(name='h1'),
dict(name='div', attrs={'class':['article-title','article-image','articlecontentbody section']}),
]
remove_tags = [
dict(name='div', attrs={'class':'highlights section'}),
]
cover url Code:
def get_cover_url(self):
soup = self.index_to_soup('https://www.magzter.com/IN/The-Indian-Express-Ltd./Financial-Express-Mumbai/Business/')
for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
return citem['content']
|
|
|
|
| Advert | |
|
|
|
|
#6 | ||
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
Times of india
Quote:
Code:
def get_cover_url(self):
soup = self.index_to_soup('https://www.magzter.com/IN/Bennett-Coleman-and-Company-Limited/The-Times-of-India-Delhi/Newspaper/')
for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
return citem['content']
LiveMint why not use same img everyday for livemint. Quote:
Last edited by unkn0wn; 04-06-2022 at 11:56 AM. |
||
|
|
|
|
|
#7 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,597
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Yes, the stats are up-to-date
|
|
|
|
|
|
#8 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
Code:
def get_cover_url(self):
soup = self.index_to_soup('https://www.magzter.com/IN/Bennett-Coleman-and-Company-Limited/The-Times-of-India-Delhi/Newspaper/')
for citem in soup.findAll('meta', content=lambda s: s and s.endswith('view/3.jpg')):
return citem['content']
its just that fetching the daily front page as cover makes it much more interesting.. sorry i kept asking you to make so many changes.. |
|
|
|
|
|
#9 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,597
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you send pull requests on github it will be easier to ensure I dont miss anything.
|
|
|
|
|
|
#10 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
removing new div in indian express
(this new div is adding too much unnecessary stuff to all the articles)
add premium-story to remove_tags classes new feed ('Political Pulse', 'https://indianexpress.com/section/india/political-pulse/feed/'), ('India', 'https://indianexpress.com/section/india/feed/'), Last edited by unkn0wn; 04-30-2022 at 12:56 AM. |
|
|
|
|
|
#11 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
similar problem with financial express
https://github.com/kovidgoyal/calibr...e_india.recipe add remove_tags = [classes('parent_also_read')] |
|
|
|
|
|
#12 |
|
Enthusiast
![]() Posts: 35
Karma: 10
Join Date: Apr 2022
Device: android tablet
|
|
|
|
|
|
|
#13 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
the changes were already made.. load from default IE recipe.
|
|
|
|
|
|
#14 |
|
Enthusiast
![]() Posts: 35
Karma: 10
Join Date: Apr 2022
Device: android tablet
|
|
|
|
|
|
|
#15 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 643
Karma: 85520
Join Date: May 2021
Device: kindle
|
no.. just use the default recipe
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Updated feeds for Indian Express | unkn0wn | Recipes | 2 | 01-27-2022 04:49 AM |
| Indian Express misses some articles | nikstar007 | Recipes | 1 | 08-30-2016 08:10 AM |
| daily express update | scissors | Recipes | 0 | 11-22-2014 03:18 AM |
| New Musical Express update 9/6/12 | scissors | Recipes | 0 | 06-09-2012 07:53 AM |
| Indian Express Recipe | sexymax15 | Recipes | 0 | 06-16-2011 06:06 AM |