Foreign affairs cover fails

unkn0wn · 05-03-2022, 12:31 PM

Quote:

self.cover_url = soup.find(**classes('subscribe-callout-image'))['data-src'].split("|")[-1]
self.cover_url = self.cover_url.split('?')[0]
self.cover_url = self.cover_url.replace('_webp_issue_small_2x', '_webp_issue_large_2x')

https://github.com/kovidgoyal/calibr...affairs.recipe

removed line 156 and changed 157 replace tags (or just replace('small', 'large')

https://cdn-live.foreignaffairs.com/...over_large.jpg .webp ? itok=LUFlkUCK

if we remove .webp link is not working..

unkn0wn · 05-03-2022, 02:24 PM

MIT tech review.. cover image fails to load

Code:

self.cover_url = soup.find(
            "div", attrs={"class":lambda name: name.startswith("magazineHero__image") if name else False}).find(
                "img",
                src=True, attrs = {"class":lambda x: x.startswith('image__img') if x else False}
                )['src']

absurl not required and img class needs to be defined

also remove_attributes = ['height', 'width']

unkn0wn · 05-03-2022, 02:33 PM

https://github.com/kovidgoyal/calibr...agazine.recipe

Cover fails

Code:

def get_cover_url(self):
        cover_url = None
        soup = self.index_to_soup('https://www.india-seminar.com/')
        citem = soup.find('img', src = lambda x: x and 'covers' in x)
        if citem:
            cover_url = "https://www.india-seminar.com/" + citem['src']
        return cover_url

and

remove_attributes = ['style', 'height', 'width']

unkn0wn · 06-01-2022, 02:20 AM

India Seminar
https://github.com/kovidgoyal/calibr...agazine.recipe

import re

and add these lines (from 42) to skip url if tag to string is empty. At present it returns without titles in ToC

Quote:

title = self.tag_to_string(a)
title = re.sub('\s+',' ', title)
empty = ' '
if title is empty:
url = ''

unkn0wn · 06-01-2022, 11:57 AM

https://github.com/kovidgoyal/calibr...s_today.recipe
business today default magazine page is for next edition.. and they keep adding articles.. I changed it to choose present edition and not the future edition thats still under construction.

from line 28

Code:

def parse_index(self):
        soup = self.index_to_soup('https://www.businesstoday.in/magazine')
        issue = soup.find(attrs={'class': 'view-id-latest_issue_magzine'})
        a = issue.findAll('a', href=lambda x: x and x.startswith('/magazine/issue/'))[1]
        url = a['href']
        self.log('issue =', url)
        soup = self.index_to_soup('https://www.businesstoday.in' + url)
        
        tag = soup.find(attrs={'class': 'issue-image'})
        if tag:
            self.cover_url = tag.find('img')['src']
        section = None
        sections = {}

and

Quote:

extra_css = 'a[href^="https://www.businesstoday.in/videos"]{display: none;}'

unkn0wn · 06-02-2022, 01:39 AM

https://github.com/kovidgoyal/calibr...merican.recipe

scientific american cover and tags
line 14

Code:

keep_classes = {'article-header', 'article-content',
                'article-media', 'article-author', 'article-text', 
                'feature-article--header', 'feature-article--header-title', 
                'opinion-article__header-title', 'author-bio' }
remove_classes = {'aside-banner', 'moreToExplore', 'article-footer', 'flex-column--25', 'article-author__suggested'}

remove line 60 and add below lines after line 63 (there's better cover in issue page)

Code:

        select = Select(self.index_to_soup(url, as_tree=True))
        cover = [x.get('src', '') for x in select('main .product-detail__image img')][0].split('?')[0]
        self.cover_url = cover + '?w=800'

        feeds = []

the + '?w=800' is to reduce the size.. the actual image is like 8k resolution - 1mb file
and masthead_url = 'https://static.scientificamerican.com/sciam/assets/Image/newsletter/salogo.png'

unkn0wn · 07-03-2022, 05:15 AM

foreign affairs
the comments section and issue section articles are the same.. I think adding ignore duplicates is much easier..

Quote:

ignore_duplicate_articles = {'title', 'url'}
remove_empty_feeds = True

foreign policy cover - it loads older edition cover image.. change

Quote:

img = soup.find('img', attrs={'data-lazy-src': lambda x: x and '-cover' in x})
self.cover_url = img['data-lazy-src']

unkn0wn · 07-03-2022, 05:18 AM

Nautilus https://github.com/kovidgoyal/calibr...autilus.recipe
COVER method change.. i think oldest article needs to be 60
oldest_article = 60 # days

Code:

def get_cover_url(self):
        soup = self.index_to_soup('https://www.presspassnow.com/nautilus/issues/')
        div = soup.find('div', **classes('image-fade_in_back'))
        if div:
            self.cover_url = div.find('img', src=True)['src']
        return getattr(self, 'cover_url', self.cover_url)

unkn0wn · 07-03-2022, 05:21 AM

Swarajya mag https://github.com/kovidgoyal/calibr...warajya.recipe

adding description

Code:

if url.startswith('/'):
                url = 'https://swarajyamag.com' + url
            title = self.tag_to_string(a)
            d = a.find_previous_sibling('a', **classes('_2nEd_'))
            if d:
                desc = 'By ' + self.tag_to_string(d) 
            self.log(title, ' at ', url, '\n', desc)
            ans.append({'title': title, 'url': url, 'description': desc})
        return [('Articles', ans)]

05-03-2022, 02:24 PM	#2
unkn0wn Guru Posts: 646 Karma: 85520 Join Date: May 2021 Device: kindle	MIT tech review.. cover image fails to load Code: self.cover_url = soup.find( "div", attrs={"class":lambda name: name.startswith("magazineHero__image") if name else False}).find( "img", src=True, attrs = {"class":lambda x: x.startswith('image__img') if x else False} )['src'] absurl not required and img class needs to be defined also remove_attributes = ['height', 'width'] Last edited by unkn0wn; 05-03-2022 at 02:31 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Foreign Affairs recipe broken?	vikshek	Recipes	5	09-06-2022 11:05 AM
Foreign Affairs recipe not working	iwayasu	Recipes	3	08-19-2019 09:09 AM
Foreign Affairs recipe broken	cornspicious	Recipes	29	02-06-2019 07:00 AM
Foreign Affairs fails to fetch	tamur93	Recipes	6	07-17-2015 11:58 AM
Foreign Affairs-Free	tdonline	Recipes	2	03-11-2012 10:51 PM

Advert

Advert