AINOnline [Recipe Request]

Samdiggly · 06-09-2019, 02:31 PM

Hello all,
First time posting here.
I am looking for a recipe to be created for https://www.ainonline.com/.
I would do it myself but I have no experience at all in this sector.
Thanks

lui1 · 06-10-2019, 07:03 PM

Hello Samdiggly,

This downloads the articles displayed on the homepage.

Recipe for AINOnline:

Code:

#!/usr/bin/env python2
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2019, Jose Ortiz <jlortiz84 at gmail.com>

from __future__ import (unicode_literals, division, absolute_import,
                        print_function)
from calibre.web.feeds.recipes import BasicNewsRecipe


INDEX = 'https://www.ainonline.com/'


def absurl(url):
    if url.startswith('/'):
        url = INDEX + url[1:]
    return url


def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(attrs={
        'class': lambda x: x and frozenset(x.split()).intersection(q)})


class AINOnline(BasicNewsRecipe):
    title = 'Aviation International News'
    __author__ = 'Jose Ortiz'
    description = ('Aviation International News covers all sectors of the aviation'
                   ' industry, from business aviation to air transport to defense and'
                   ' unmanned aerial vehicles.')
    language = 'en'
    encoding = 'utf-8'
    no_stylesheets = True
    remove_javascript = True
    masthead_url = 'https://www.ainonline.com/sites/ainonline.com/themes/ain30/images/ainlogo-small.jpg'
    keep_only_tags=[classes('main-content')]
    remove_tags = [
        dict(name=['button','input']),
        dict(attrs={'class': lambda x: x and 'comments' in x})
    ]

    def parse_index(self):

        soup = self.index_to_soup(INDEX)

        # css selectors for articles
        #     .view-content [class *= 'featured-story']
        #     .view-content .views-row
        article_attrs = {
            'class': lambda x: x and (
                'featured-story' in x
                or {'views-row'}.intersection(x.split()))}

        ans = []

        for section in soup.findAll(**classes('view-content')):

            if section.findParent(
                    attrs=dict(id='featured')) is not None:
                current_section = 'Featured'
            elif section.findParent(
                    attrs=dict(
                        id='home-top-stories')) is not None:
                current_section = 'Top Stories'
            elif section.findParent(
                    attrs=dict(
                        id='quicktabs-container-latest_trending'
                    )) is not None:
                current_section = 'Latest/Trending'
            else:
                current_section = 'Articles'

            articles = []
            for div in section.findAll(attrs=article_attrs):
                if {'views-row'}.intersection(div['class'].split()):
                    a = div.find(**classes('title')).a
                elif 'featured-story' in div['class']:
                    a = div.find(
                        lambda tag: tag.name == 'a'
                        and tag.find(['h1','h2','h3','h4','h5','h6'])
                        is not None)
                title = self.tag_to_string(a)
                url = absurl(a['href'])
                desc = ''
                r = div.find(**classes('teaser'))
                if r is not None:
                    desc = self.tag_to_string(r)
                articles.append(
                    {'title': title, 'url': url, 'description': desc})
            if articles:
                for title, articles_  in ans:
                    if current_section == title:
                        articles_.extend(articles)
                        break
                else:
                    ans.append((current_section, articles))

        return ans

duluoz · 06-11-2019, 07:15 AM

Quote:

Originally Posted by lui1

Hello Samdiggly,

This downloads the articles displayed on the homepage.

Hi lui1,

When using this recipe with ebook-convert I get an error message from line 75 saying:

Code:

'list' object has no attribute 'split'

Any ideas why this could be?

Thanks again

lui1 · 06-11-2019, 04:58 PM

Quote:

Originally Posted by duluoz

Hi lui1,

When using this recipe with ebook-convert I get an error message from line 75 saying:

Code:

'list' object has no attribute 'split'

Any ideas why this could be?

Thanks again

I don't know why you are getting that error message, but I'm using calibre
version 3.39.1 and the recipe works fine for me. What version are you
using?

Anyways, I made a few changes to it so that it looks more consistent, but it's
practically the same recipe. Maybe it will solve the problem.

duluoz · 06-12-2019, 03:44 AM

Quote:

Originally Posted by lui1

I don't know why you are getting that error message, but I'm using calibre
version 3.39.1 and the recipe works fine for me. What version are you
using?

Anyways, I made a few changes to it so that it looks more consistent, but it's
practically the same recipe. Maybe it will solve the problem.

Calibre version 3.44 - the latest version. Looks like it breaks your recipe. Wonder if Kovid could offer some advice?

kovidgoyal · 06-12-2019, 03:57 AM

div['class'] is a list, not a string, in BS4 which newer versions of calibre use

lui1 · 06-12-2019, 02:24 PM

Quote:

Originally Posted by kovidgoyal

div['class'] is a list, not a string, in BS4 which newer versions of calibre use

Oh I see, thanks Kovid. I was using the one that comes from debian, so I'll have to upgrade to the one you publish on your website. What would you suggest if I wanted the recipes to run on most versions of calibre, given the fact that perhaps many people haven't upgraded to the latest version of calibre. I suppose I could write more code to account for the differences between bs3 and bs4, or use lxml which I assume is compatible accross all versions of calibre.

Also I see that you have fixed the problem in the git repository, so thanks for that too.

kovidgoyal · 06-12-2019, 07:44 PM

For new recipes it does not matter, since new recipes are only available once a new release of calibre is made, just use bs4.

For updating old recipes, one just has to be a bit careful to write code that works for both. The main incompatibilities are creating Tag objects and directly check class values

For creating Tag objects, use this wrapper function, which works with both bs3 and bs4.

Code:

def new_tag(soup, name, attrs=()):
    impl = getattr(soup, 'new_tag', None)
    if impl is not None:
        return impl(name, attrs=dict(attrs))
    return Tag(soup, name, attrs=attrs or None)

If you wish to directly check if the clas attribute of a tag has a value, use somethinglike:

Code:

def check_in(tag, attr, val):
   q = tag[attr]
   if not isinstance(q, list):
       q = q.split()
   return attr in q

Samdiggly · 07-13-2019, 04:37 PM

Any updates on this project? I am still looking into moving this into Calibre

06-09-2019, 02:31 PM	#1
Samdiggly Junior Member Posts: 3 Karma: 10 Join Date: Jun 2019 Device: Kindle	AINOnline [Recipe Request] Hello all, First time posting here. I am looking for a recipe to be created for https://www.ainonline.com/. I would do it myself but I have no experience at all in this sector. Thanks

06-12-2019, 07:44 PM	#8
kovidgoyal creator of calibre Posts: 43,843 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	For new recipes it does not matter, since new recipes are only available once a new release of calibre is made, just use bs4. For updating old recipes, one just has to be a bit careful to write code that works for both. The main incompatibilities are creating Tag objects and directly check class values For creating Tag objects, use this wrapper function, which works with both bs3 and bs4. Code: def new_tag(soup, name, attrs=()): impl = getattr(soup, 'new_tag', None) if impl is not None: return impl(name, attrs=dict(attrs)) return Tag(soup, name, attrs=attrs or None) If you wish to directly check if the clas attribute of a tag has a value, use somethinglike: Code: def check_in(tag, attr, val): q = tag[attr] if not isinstance(q, list): q = q.split() return attr in q

07-13-2019, 04:37 PM	#9
Samdiggly Junior Member Posts: 3 Karma: 10 Join Date: Jun 2019 Device: Kindle	Any updates? Any updates on this project? I am still looking into moving this into Calibre

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Recipe Request	NSILMike	Recipes	5	12-08-2018 08:15 AM
Recipe Request	Tadpole Angel	Recipes	0	07-22-2013 05:49 AM
recipe request	Torx	Recipes	0	12-20-2010 08:33 AM
Request for Recipe	girlperson1	Calibre	2	11-14-2008 10:43 PM
Request for Recipe	girlperson1	Calibre	2	11-14-2008 07:59 AM

06-12-2019, 03:57 AM	#6
kovidgoyal creator of calibre Posts: 43,843 Karma: 22666666 Join Date: Oct 2006 Location: Mumbai, India Device: Various	div['class'] is a list, not a string, in BS4 which newer versions of calibre use