![]() |
#1 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
Swarajya Magazine (Monthly)
Code:
from calibre.web.feeds.news import BasicNewsRecipe, classes class SwarajyaMag(BasicNewsRecipe): title = u'Swarajya Magazine' __author__ = 'unkn0wn' description = 'Swarajya - a big tent for liberal right of centre discourse that reaches out, engages and caters to the new India.' language = 'en_GB' no_stylesheets = True remove_javascript = True use_embedded_content = False remove_attributes = ['height', 'width'] encoding = 'utf-8' keep_only_tags = [ classes('_2PqtR _1sMRD ntw8h author-bio'), ] remove_tags = [ classes('_JscD _2r17a'), ] def preprocess_html(self, soup): for img in soup.findAll('img', attrs={'data-src': True}): img['src'] = img['data-src'].split('?')[0] return soup def parse_index(self): soup = self.index_to_soup('https://swarajyamag.com/issue/a-catalyst-for-growth') ans = [] for a in soup.findAll(**classes('_2eOQr')): url = a['href'] if url.startswith('/'): url = 'https://swarajyamag.com' + url title = self.tag_to_string(a) self.log(title, ' at ', url) ans.append({'title': title, 'url': url}) return [('Articles', ans)] Code:
<div class="_3BU_3"> <a href="/issue/a-catalyst-for-growth"> <img src="https://gumlet.assettype.com/swarajya%2F2022-03%2F1cd6eed6-3f1f-4eff-b163-076ad9492ae4%2FMarch_2022_Cover.jpg?auto=format%2Ccompress&format=webp&w=360&dpr=1.0" data-src="https://gumlet.assettype.com/swarajya%2F2022-03%2F1cd6eed6-3f1f-4eff-b163-076ad9492ae4%2FMarch_2022_Cover.jpg?auto=format%2Ccompress" alt="A Catalyst For Growth" sizes="( max-width: 500px ) 98vw, ( max-width: 768px ) 48vw, 23vw" class="qt-image gm-added gm-loaded gm-observing gm-observing-cb" loading="lazy" title="" style=""> <noscript></noscript></a></div> Code:
def get_cover_url(self): soup = self.index_to_soup('https://swarajyamag.com/') tag = soup.find(attrs={'class': '_3BU_3'}) if tag: self.cover_url = tag.find('img')['src'] return super().get_cover_url() Help! Other than that everything works great, if you replace the magazine link each month. Last edited by unkn0wn; 03-13-2022 at 07:41 AM. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,316
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
|
Thank you
a = soup.find('a', href=lambda x: x and x.startswith('/issue/')) does href=lambda x: x mean that it'll select the first link that shows up with /issue/? |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,316
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It will do that, but the x and is there to avoid errors if there are a tags with no href atrributes.
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Discover Magazine Monthly fails | NSILMike | Recipes | 6 | 08-07-2020 12:34 PM |
Discover Magazine Monthly | NSILMike | Recipes | 0 | 07-03-2020 11:38 AM |
Wired Magazine Monthly Edition paywall recipe | Squiffers | Recipes | 6 | 12-05-2018 04:28 AM |
Discover Magazine Monthly recipe fails | NSILMike | Recipes | 4 | 01-30-2016 03:25 PM |
Monthly Magazine PDF's - Is The iPad My Only Option?? | Rex32 | Which one should I buy? | 2 | 05-30-2010 07:01 AM |