Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-09-2019, 02:31 PM   #1
Samdiggly
Junior Member
Samdiggly began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jun 2019
Device: Kindle
Smile AINOnline [Recipe Request]

Hello all,
First time posting here.
I am looking for a recipe to be created for https://www.ainonline.com/.
I would do it myself but I have no experience at all in this sector.
Thanks
Samdiggly is offline   Reply With Quote
Old 06-10-2019, 07:03 PM   #2
lui1
Enthusiast
lui1 began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Dec 2017
Location: Los Angeles, CA
Device: Smart Phone
Recipe for AINOnline

Hello Samdiggly,

This downloads the articles displayed on the homepage.

Recipe for AINOnline:
Code:
#!/usr/bin/env python2
# vim:fileencoding=utf-8
# License: GPLv3 Copyright: 2019, Jose Ortiz <jlortiz84 at gmail.com>

from __future__ import (unicode_literals, division, absolute_import,
                        print_function)
from calibre.web.feeds.recipes import BasicNewsRecipe


INDEX = 'https://www.ainonline.com/'


def absurl(url):
    if url.startswith('/'):
        url = INDEX + url[1:]
    return url


def classes(classes):
    q = frozenset(classes.split(' '))
    return dict(attrs={
        'class': lambda x: x and frozenset(x.split()).intersection(q)})


class AINOnline(BasicNewsRecipe):
    title = 'Aviation International News'
    __author__ = 'Jose Ortiz'
    description = ('Aviation International News covers all sectors of the aviation'
                   ' industry, from business aviation to air transport to defense and'
                   ' unmanned aerial vehicles.')
    language = 'en'
    encoding = 'utf-8'
    no_stylesheets = True
    remove_javascript = True
    masthead_url = 'https://www.ainonline.com/sites/ainonline.com/themes/ain30/images/ainlogo-small.jpg'
    keep_only_tags=[classes('main-content')]
    remove_tags = [
        dict(name=['button','input']),
        dict(attrs={'class': lambda x: x and 'comments' in x})
    ]

    def parse_index(self):

        soup = self.index_to_soup(INDEX)

        # css selectors for articles
        #     .view-content [class *= 'featured-story']
        #     .view-content .views-row
        article_attrs = {
            'class': lambda x: x and (
                'featured-story' in x
                or {'views-row'}.intersection(x.split()))}

        ans = []

        for section in soup.findAll(**classes('view-content')):

            if section.findParent(
                    attrs=dict(id='featured')) is not None:
                current_section = 'Featured'
            elif section.findParent(
                    attrs=dict(
                        id='home-top-stories')) is not None:
                current_section = 'Top Stories'
            elif section.findParent(
                    attrs=dict(
                        id='quicktabs-container-latest_trending'
                    )) is not None:
                current_section = 'Latest/Trending'
            else:
                current_section = 'Articles'

            articles = []
            for div in section.findAll(attrs=article_attrs):
                if {'views-row'}.intersection(div['class'].split()):
                    a = div.find(**classes('title')).a
                elif 'featured-story' in div['class']:
                    a = div.find(
                        lambda tag: tag.name == 'a'
                        and tag.find(['h1','h2','h3','h4','h5','h6'])
                        is not None)
                title = self.tag_to_string(a)
                url = absurl(a['href'])
                desc = ''
                r = div.find(**classes('teaser'))
                if r is not None:
                    desc = self.tag_to_string(r)
                articles.append(
                    {'title': title, 'url': url, 'description': desc})
            if articles:
                for title, articles_  in ans:
                    if current_section == title:
                        articles_.extend(articles)
                        break
                else:
                    ans.append((current_section, articles))

        return ans
Attached Files
File Type: zip ainonline.zip (1.4 KB, 153 views)
lui1 is offline   Reply With Quote
Old 06-11-2019, 07:15 AM   #3
duluoz
Newsbeamer dev
duluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheese
 
Posts: 122
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
Quote:
Originally Posted by lui1 View Post
Hello Samdiggly,

This downloads the articles displayed on the homepage.
Hi lui1,

When using this recipe with ebook-convert I get an error message from line 75 saying:
Code:
'list' object has no attribute 'split'
Any ideas why this could be?

Thanks again
duluoz is offline   Reply With Quote
Old 06-11-2019, 04:58 PM   #4
lui1
Enthusiast
lui1 began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Dec 2017
Location: Los Angeles, CA
Device: Smart Phone
Quote:
Originally Posted by duluoz View Post
Hi lui1,

When using this recipe with ebook-convert I get an error message from line 75 saying:
Code:
'list' object has no attribute 'split'
Any ideas why this could be?

Thanks again
I don't know why you are getting that error message, but I'm using calibre
version 3.39.1 and the recipe works fine for me. What version are you
using?

Anyways, I made a few changes to it so that it looks more consistent, but it's
practically the same recipe. Maybe it will solve the problem.
Attached Files
File Type: zip ainonline.zip (1.4 KB, 153 views)
lui1 is offline   Reply With Quote
Old 06-12-2019, 03:44 AM   #5
duluoz
Newsbeamer dev
duluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheeseduluoz can extract oil from cheese
 
Posts: 122
Karma: 1000
Join Date: Dec 2011
Device: Kindle Voyage
Quote:
Originally Posted by lui1 View Post
I don't know why you are getting that error message, but I'm using calibre
version 3.39.1 and the recipe works fine for me. What version are you
using?

Anyways, I made a few changes to it so that it looks more consistent, but it's
practically the same recipe. Maybe it will solve the problem.
Calibre version 3.44 - the latest version. Looks like it breaks your recipe. Wonder if Kovid could offer some advice?
duluoz is offline   Reply With Quote
Old 06-12-2019, 03:57 AM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
div['class'] is a list, not a string, in BS4 which newer versions of calibre use
kovidgoyal is offline   Reply With Quote
Old 06-12-2019, 02:24 PM   #7
lui1
Enthusiast
lui1 began at the beginning.
 
Posts: 36
Karma: 10
Join Date: Dec 2017
Location: Los Angeles, CA
Device: Smart Phone
Quote:
Originally Posted by kovidgoyal View Post
div['class'] is a list, not a string, in BS4 which newer versions of calibre use
Oh I see, thanks Kovid. I was using the one that comes from debian, so I'll have to upgrade to the one you publish on your website. What would you suggest if I wanted the recipes to run on most versions of calibre, given the fact that perhaps many people haven't upgraded to the latest version of calibre. I suppose I could write more code to account for the differences between bs3 and bs4, or use lxml which I assume is compatible accross all versions of calibre.

Also I see that you have fixed the problem in the git repository, so thanks for that too.
lui1 is offline   Reply With Quote
Old 06-12-2019, 07:44 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
For new recipes it does not matter, since new recipes are only available once a new release of calibre is made, just use bs4.

For updating old recipes, one just has to be a bit careful to write code that works for both. The main incompatibilities are creating Tag objects and directly check class values

For creating Tag objects, use this wrapper function, which works with both bs3 and bs4.

Code:
def new_tag(soup, name, attrs=()):
    impl = getattr(soup, 'new_tag', None)
    if impl is not None:
        return impl(name, attrs=dict(attrs))
    return Tag(soup, name, attrs=attrs or None)
If you wish to directly check if the clas attribute of a tag has a value, use somethinglike:

Code:
def check_in(tag, attr, val):
   q = tag[attr]
   if not isinstance(q, list):
       q = q.split()
   return attr in q
kovidgoyal is offline   Reply With Quote
Old 07-13-2019, 04:37 PM   #9
Samdiggly
Junior Member
Samdiggly began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Jun 2019
Device: Kindle
Any updates?

Any updates on this project? I am still looking into moving this into Calibre
Samdiggly is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe Request NSILMike Recipes 5 12-08-2018 08:15 AM
Recipe Request Tadpole Angel Recipes 0 07-22-2013 05:49 AM
recipe request Torx Recipes 0 12-20-2010 08:33 AM
Request for Recipe girlperson1 Calibre 2 11-14-2008 10:43 PM
Request for Recipe girlperson1 Calibre 2 11-14-2008 07:59 AM


All times are GMT -4. The time now is 03:28 PM.


MobileRead.com is a privately owned, operated and funded community.