Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-05-2023, 09:38 PM   #1
leader_montanus
Junior Member
leader_montanus began at the beginning.
 
Posts: 9
Karma: 10
Join Date: May 2023
Device: Onyx Boox Nova Air
Appending articles to a feed fails

Hi,

I've been using Calibre for a few years and have also used the recipe function to download news every day. Most of the recipes I use are slightly modified and based on RSS feeds.

I'm currently stumped trying to add a few articles from a HTML page to a previously populated feed. I have looked at the example here of how to add articles and I have also looked at several existing recipes.

What I am doing is as follows:
  1. Use a standard RSS feed definition:
    Code:
    feeds = [ ('Long Reads', 'https://longreads.com/feed/'), ]
    (There are in fact several other RSS feeds here, for clarity I am showing only the relevant one)
  2. I have created a parse_feeds function that first runs the base parse_feeds function, then loops through all the feedss/articles to checks for one particular page which is updated weekly (5 best long reads)
  3. It then extracts the links on this page and tries to append them to the feeds list. The code is as follows:

    Code:
     
        def parse_feeds(self):
        feeds = super(LongReads, self).parse_feeds()
    
        for articles in feeds:
          section = articles.title
          for article in articles:
            if article.url and 'longreads.com' in article.url:
              raw = browser().open_novisit(article.url).read()
              soup = BeautifulSoup(raw)
              newArticles = []
              for item in soup.findAll('a', attrs={'target': '_blank'}):
                if item.parent.name == 'h3':
                  newArt = {}
                  newArt['title'] = item.string
                  newArt['url'] = item['href']
                  newArticles.append(newArt)          
              feeds.append((section, newArticles))
        return feeds
  4. An example of the page being downloaded can be seen here: https://longreads.com/2023/04/21/the...-the-week-462/

The links are extracted correctly, the issue is that I always get the error 'tuple' object has no attribute 'title'. The example I base it on is obviously old, but I also see several newer recipies where it works to use the append function for the feed.

Outputting the feeds array shows this (excerpt), so obviously the links are added incorrectly:

Code:
____________________
Title       : SolarWinds: The Untold Story of the Boldest Supply-Chain Hack
URL         : https://www.wired.com/story/the-untold-story-of-solarwinds-the-boldest-supply-chain-hack-ever/
Author      : Kim Zetter
Summary     : The attackers were i...
Date        : Tue, 02 May, 2023 12:00
TOC thumb   : None
Has content : False

, ('section', [{'title': '1. A Trucker’s Kidnapping, a Suspicious Ransom, and a Colorado Family’s Perilous Quest for Justice', 'url': 'https://www.5280.com/a-truckers-kidnapping-a-suspicious-ransom-and-a-colorado-familys-perilous-quest-for-justice/?src=longreads'},
I have also tried to create an array of Feed objects, when I use the append function it then complains that 'Feed' object has no attribute 'articles'.'

Grateful for any help with this, there is obviously something simple that I cannot see...
leader_montanus is offline   Reply With Quote
Old 05-05-2023, 11:57 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
feeds is a list of Feed objects. The form (title, list of feeds) is used in parse_index() not parse_feeds().
kovidgoyal is offline   Reply With Quote
Advert
Old 05-06-2023, 03:51 PM   #3
leader_montanus
Junior Member
leader_montanus began at the beginning.
 
Posts: 9
Karma: 10
Join Date: May 2023
Device: Onyx Boox Nova Air
Quote:
Originally Posted by kovidgoyal View Post
feeds is a list of Feed objects. The form (title, list of feeds) is used in parse_index() not parse_feeds().
Thanks for the quick response and for pointing me in the right direction, shows why it's worth supporting Calibre! This should also teach me not to program at night when I'm tired

The code that works in the end looks like this, using the built in feeds_from_index function to create feed objects:

Code:
# subclass parse_feeds and then add the links from the Long Reads HTML page to the feeds list
  def parse_feeds(self):
    feeds = super(LongReads, self).parse_feeds()

# Loop through existing articles until hit on the one from Long Reads website
    newArticles = []
    for curfeed in feeds:
      for a, curarticle in enumerate(curfeed.articles):
        
# found the Long Reads page, extract links and summary using standard BS function
        if curarticle.url and 'longreads.com' in curarticle.url:
          raw = browser().open_novisit(curarticle.url).read()
          soup = BeautifulSoup(raw)
          for item in soup.findAll('a', attrs={'target': '_blank'}):
            if item.parent.name == 'h3':
# found a link, create a new dictionary entry in basic article format and add to list
              newArticles.append({
                                  'title': item.string,
                                  'date': date.today(),
                                  'url': item['href'],
                                  'description': item.parent.findNext('p').findNext('p').contents[0]
                                  })

# If there are any links, create/append a new Feed object
          if len(newArticles) > 0:

# use built in function to create feed objects from list of dictionaries with article info
            newfeeds = feeds_from_index([('Long Reads', newArticles)], oldest_article=self.oldest_article,
                                      max_articles_per_feed=self.max_articles_per_feed)

# add the new feed objects to existing feed list, needs to be done one by one
            for newfeed in newfeeds: 
              feeds.append(newfeed)

# finally delete original page as it is just a link page
          feeds.pop(feeds.index(curfeed))
          return feeds

# in case Long Reads page not downloaded we have this catch-all for returning feeds
    return feeds
leader_montanus is offline   Reply With Quote
Reply

Tags
feed, parse

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Appending URLs in an RSS feed Phoebus Recipes 2 08-10-2019 03:16 PM
Feed is titled "all articles" if only one list of articles is found sup Recipes 0 11-30-2013 05:31 PM
Articles repeated in different feed sections scissors Recipes 8 10-19-2012 11:27 AM
The Age Feed - repeat articles Quasii Recipes 2 03-09-2011 06:38 PM
Sorting articles of RSS feed miwie Recipes 1 11-21-2010 01:02 AM


All times are GMT -4. The time now is 04:56 AM.


MobileRead.com is a privately owned, operated and funded community.