06-09-2019, 02:31 PM | #1 |
Junior Member
Posts: 3
Karma: 10
Join Date: Jun 2019
Device: Kindle
|
AINOnline [Recipe Request]
Hello all,
First time posting here. I am looking for a recipe to be created for https://www.ainonline.com/. I would do it myself but I have no experience at all in this sector. Thanks |
06-10-2019, 07:03 PM | #2 |
Enthusiast
Posts: 36
Karma: 10
Join Date: Dec 2017
Location: Los Angeles, CA
Device: Smart Phone
|
Recipe for AINOnline
Hello Samdiggly,
This downloads the articles displayed on the homepage. Recipe for AINOnline: Code:
#!/usr/bin/env python2 # vim:fileencoding=utf-8 # License: GPLv3 Copyright: 2019, Jose Ortiz <jlortiz84 at gmail.com> from __future__ import (unicode_literals, division, absolute_import, print_function) from calibre.web.feeds.recipes import BasicNewsRecipe INDEX = 'https://www.ainonline.com/' def absurl(url): if url.startswith('/'): url = INDEX + url[1:] return url def classes(classes): q = frozenset(classes.split(' ')) return dict(attrs={ 'class': lambda x: x and frozenset(x.split()).intersection(q)}) class AINOnline(BasicNewsRecipe): title = 'Aviation International News' __author__ = 'Jose Ortiz' description = ('Aviation International News covers all sectors of the aviation' ' industry, from business aviation to air transport to defense and' ' unmanned aerial vehicles.') language = 'en' encoding = 'utf-8' no_stylesheets = True remove_javascript = True masthead_url = 'https://www.ainonline.com/sites/ainonline.com/themes/ain30/images/ainlogo-small.jpg' keep_only_tags=[classes('main-content')] remove_tags = [ dict(name=['button','input']), dict(attrs={'class': lambda x: x and 'comments' in x}) ] def parse_index(self): soup = self.index_to_soup(INDEX) # css selectors for articles # .view-content [class *= 'featured-story'] # .view-content .views-row article_attrs = { 'class': lambda x: x and ( 'featured-story' in x or {'views-row'}.intersection(x.split()))} ans = [] for section in soup.findAll(**classes('view-content')): if section.findParent( attrs=dict(id='featured')) is not None: current_section = 'Featured' elif section.findParent( attrs=dict( id='home-top-stories')) is not None: current_section = 'Top Stories' elif section.findParent( attrs=dict( id='quicktabs-container-latest_trending' )) is not None: current_section = 'Latest/Trending' else: current_section = 'Articles' articles = [] for div in section.findAll(attrs=article_attrs): if {'views-row'}.intersection(div['class'].split()): a = div.find(**classes('title')).a elif 'featured-story' in div['class']: a = div.find( lambda tag: tag.name == 'a' and tag.find(['h1','h2','h3','h4','h5','h6']) is not None) title = self.tag_to_string(a) url = absurl(a['href']) desc = '' r = div.find(**classes('teaser')) if r is not None: desc = self.tag_to_string(r) articles.append( {'title': title, 'url': url, 'description': desc}) if articles: for title, articles_ in ans: if current_section == title: articles_.extend(articles) break else: ans.append((current_section, articles)) return ans |
06-11-2019, 07:15 AM | #3 |
Newsbeamer dev
Posts: 122
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
|
06-11-2019, 04:58 PM | #4 | |
Enthusiast
Posts: 36
Karma: 10
Join Date: Dec 2017
Location: Los Angeles, CA
Device: Smart Phone
|
Quote:
version 3.39.1 and the recipe works fine for me. What version are you using? Anyways, I made a few changes to it so that it looks more consistent, but it's practically the same recipe. Maybe it will solve the problem. |
|
06-12-2019, 03:44 AM | #5 | |
Newsbeamer dev
Posts: 122
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
|
Quote:
|
|
06-12-2019, 03:57 AM | #6 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
div['class'] is a list, not a string, in BS4 which newer versions of calibre use
|
06-12-2019, 02:24 PM | #7 | |
Enthusiast
Posts: 36
Karma: 10
Join Date: Dec 2017
Location: Los Angeles, CA
Device: Smart Phone
|
Quote:
Also I see that you have fixed the problem in the git repository, so thanks for that too. |
|
06-12-2019, 07:44 PM | #8 |
creator of calibre
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
For new recipes it does not matter, since new recipes are only available once a new release of calibre is made, just use bs4.
For updating old recipes, one just has to be a bit careful to write code that works for both. The main incompatibilities are creating Tag objects and directly check class values For creating Tag objects, use this wrapper function, which works with both bs3 and bs4. Code:
def new_tag(soup, name, attrs=()): impl = getattr(soup, 'new_tag', None) if impl is not None: return impl(name, attrs=dict(attrs)) return Tag(soup, name, attrs=attrs or None) Code:
def check_in(tag, attr, val): q = tag[attr] if not isinstance(q, list): q = q.split() return attr in q |
07-13-2019, 04:37 PM | #9 |
Junior Member
Posts: 3
Karma: 10
Join Date: Jun 2019
Device: Kindle
|
Any updates?
Any updates on this project? I am still looking into moving this into Calibre
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe Request | NSILMike | Recipes | 5 | 12-08-2018 08:15 AM |
Recipe Request | Tadpole Angel | Recipes | 0 | 07-22-2013 05:49 AM |
recipe request | Torx | Recipes | 0 | 12-20-2010 08:33 AM |
Request for Recipe | girlperson1 | Calibre | 2 | 11-14-2008 10:43 PM |
Request for Recipe | girlperson1 | Calibre | 2 | 11-14-2008 07:59 AM |