05-23-2019, 03:36 AM | #1 |
Newsbeamer dev
Posts: 122
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
[Recipe Request] The Baffler
Possible to create a recipe for The Baffler?
All issues can be found at link below. Latest is always the first on left. I had a go but could not figure out a way to download the latest, due to the URL naming scheme. https://thebaffler.com/issues Thanks! Jamie |
05-23-2019, 08:37 PM | #2 |
Enthusiast
Posts: 36
Karma: 10
Join Date: Dec 2017
Location: Los Angeles, CA
Device: Smart Phone
|
New Recipe for "The Baffler"
Hello Jamie,
This recipe should do the trick. Let me know if it works. Jose New Recipe for The Baffler: Code:
from calibre.web.feeds.recipes import BasicNewsRecipe import re def classes(classes): q = frozenset(classes.split(' ')) return dict( attrs={'class': lambda x: x and frozenset(x.split()).intersection(q)} ) class TheBaffler(BasicNewsRecipe): title = 'The Baffler' __author__ = 'Jose Ortiz' description = ('This magazine contains left-wing criticism, cultural analysis, shorts' ' stories, poems and art. They publish six print issues annually.') language = 'en_US' encoding = 'UTF-8' no_javascript = True no_stylesheets = True keep_only_tags = [ classes('header-contain entry-content') ] def parse_index(self): soup = self.index_to_soup('https://thebaffler.com/issues').main.article self.timefmt = ' [%s]' % self.tag_to_string(soup.find(**classes('date'))).strip() try: self.cover_url = re.sub( r'.*?url\((.*?)\).*', r'\1', soup.find(**classes('image-fill'))['style']).strip() self.log('cover_url at ', self.cover_url) except: self.log.error('Failed to download cover_url') soup = self.index_to_soup(soup.a['href']) # Extract comments from `.entry-content' and prepend to self.description self.description = ( u'\n\n' + self.tag_to_string(soup.find(**classes('entry-content'))) + u'\n\n' + self.description ) ans = [] # Articles at `.contents section .meta' for section in soup.find(**classes('contents'))('section'): current_section = self.tag_to_string(section.h2) self.log(current_section) articles = [] for div in section(**classes('meta')): # Getting articles a = div.find(**classes('title')).a title = self.tag_to_string(a) url = a['href'] self.log('\t', title, ' at ', url) desc = '' r = div.find(**classes('deck')) if r is not None: desc = self.tag_to_string(r) articles.append( {'title': title, 'url': url, 'description': desc}) if current_section and articles: ans.append((current_section,articles)) return ans |
Advert | |
|
05-23-2019, 10:27 PM | #3 |
Newsbeamer dev
Posts: 122
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
Jose - many thanks for this. Works perfectly - very clever!
Thanks Jamie |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe request | sarumikio | Recipes | 2 | 05-28-2013 06:34 AM |
recipe request | polymath | Recipes | 0 | 05-22-2013 06:09 PM |
recipe request | Torx | Recipes | 0 | 12-20-2010 08:33 AM |
Request for recipe | sumper | Recipes | 2 | 10-11-2010 02:25 AM |
Recipe request please | aessedai44 | Recipes | 2 | 10-06-2010 01:07 AM |