Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 03-13-2022, 07:28 AM   #1
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
Swarajya Magazine (Monthly)

Code:
from calibre.web.feeds.news import BasicNewsRecipe, classes

class SwarajyaMag(BasicNewsRecipe):
    title = u'Swarajya Magazine'
    __author__ = 'unkn0wn'
    description = 'Swarajya - a big tent for liberal right of centre discourse that reaches out, engages and caters to the new India.'
    language = 'en_GB'
    no_stylesheets = True
    remove_javascript = True
    use_embedded_content = False
    remove_attributes = ['height', 'width']
    encoding = 'utf-8'
    
    keep_only_tags = [
        classes('_2PqtR _1sMRD ntw8h author-bio'),
    ]

    remove_tags = [
        classes('_JscD _2r17a'),
    ]
    
    def preprocess_html(self, soup):
        for img in soup.findAll('img', attrs={'data-src': True}):
            img['src'] = img['data-src'].split('?')[0] 
        return soup
    
    def parse_index(self):
        soup = self.index_to_soup('https://swarajyamag.com/issue/a-catalyst-for-growth')
        ans = []
        
        for a in soup.findAll(**classes('_2eOQr')):
            url = a['href']
            if url.startswith('/'):
                url = 'https://swarajyamag.com' + url
            title = self.tag_to_string(a)
            self.log(title, ' at ', url)
            ans.append({'title': title, 'url': url})
        return [('Articles', ans)]
The above recipe works great but from the above you can see that I've provided for actual link instead of automating it to find it.

Code:
<div class="_3BU_3">

<a href="/issue/a-catalyst-for-growth">

<img src="https://gumlet.assettype.com/swarajya%2F2022-03%2F1cd6eed6-3f1f-4eff-b163-076ad9492ae4%2FMarch_2022_Cover.jpg?auto=format%2Ccompress&amp;format=webp&amp;w=360&amp;dpr=1.0" data-src="https://gumlet.assettype.com/swarajya%2F2022-03%2F1cd6eed6-3f1f-4eff-b163-076ad9492ae4%2FMarch_2022_Cover.jpg?auto=format%2Ccompress" alt="A Catalyst For Growth" sizes="( max-width: 500px ) 98vw, ( max-width: 768px ) 48vw, 23vw" class="qt-image gm-added gm-loaded gm-observing gm-observing-cb" loading="lazy" title="" style="">

<noscript></noscript></a></div>
The above is from the default page (https://swarajyamag.com/) within which we find the above class '_3BU_3' where we can automate to find href(actual magazine page to parse for articles) and cover_url.

Code:
def get_cover_url(self):
        soup = self.index_to_soup('https://swarajyamag.com/')
        tag = soup.find(attrs={'class': '_3BU_3'})
        if tag:
            self.cover_url = tag.find('img')['src']
        return super().get_cover_url()
I tried above for cover_url but I knew that it won't work cause I don't know how to split it.. img['src'] = img['data-src'].split('?')[0]

Help!

Other than that everything works great, if you replace the magazine link each month.
Attached Files
File Type: recipe Swarajya Magazine.recipe (1.3 KB, 110 views)

Last edited by unkn0wn; 03-13-2022 at 07:41 AM.
unkn0wn is offline   Reply With Quote
Old 03-14-2022, 01:04 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,316
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
https://github.com/kovidgoyal/calibr...f2caa453851b9f
kovidgoyal is offline   Reply With Quote
Advert
Old 03-14-2022, 06:28 AM   #3
unkn0wn
Guru
unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.unkn0wn understands the Henderson-Hasselbalch Equation.
 
Posts: 615
Karma: 85520
Join Date: May 2021
Device: kindle
Thank you

a = soup.find('a', href=lambda x: x and x.startswith('/issue/'))

does href=lambda x: x mean that it'll select the first link that shows up with /issue/?
unkn0wn is offline   Reply With Quote
Old 03-14-2022, 07:16 AM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,316
Karma: 27111242
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It will do that, but the x and is there to avoid errors if there are a tags with no href atrributes.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Discover Magazine Monthly fails NSILMike Recipes 6 08-07-2020 12:34 PM
Discover Magazine Monthly NSILMike Recipes 0 07-03-2020 11:38 AM
Wired Magazine Monthly Edition paywall recipe Squiffers Recipes 6 12-05-2018 04:28 AM
Discover Magazine Monthly recipe fails NSILMike Recipes 4 01-30-2016 03:25 PM
Monthly Magazine PDF's - Is The iPad My Only Option?? Rex32 Which one should I buy? 2 05-30-2010 07:01 AM


All times are GMT -4. The time now is 01:26 AM.


MobileRead.com is a privately owned, operated and funded community.