View Single Post
Old 05-05-2023, 09:38 PM   #1
leader_montanus
Junior Member
leader_montanus began at the beginning.
 
Posts: 9
Karma: 10
Join Date: May 2023
Device: Onyx Boox Nova Air
Appending articles to a feed fails

Hi,

I've been using Calibre for a few years and have also used the recipe function to download news every day. Most of the recipes I use are slightly modified and based on RSS feeds.

I'm currently stumped trying to add a few articles from a HTML page to a previously populated feed. I have looked at the example here of how to add articles and I have also looked at several existing recipes.

What I am doing is as follows:
  1. Use a standard RSS feed definition:
    Code:
    feeds = [ ('Long Reads', 'https://longreads.com/feed/'), ]
    (There are in fact several other RSS feeds here, for clarity I am showing only the relevant one)
  2. I have created a parse_feeds function that first runs the base parse_feeds function, then loops through all the feedss/articles to checks for one particular page which is updated weekly (5 best long reads)
  3. It then extracts the links on this page and tries to append them to the feeds list. The code is as follows:

    Code:
     
        def parse_feeds(self):
        feeds = super(LongReads, self).parse_feeds()
    
        for articles in feeds:
          section = articles.title
          for article in articles:
            if article.url and 'longreads.com' in article.url:
              raw = browser().open_novisit(article.url).read()
              soup = BeautifulSoup(raw)
              newArticles = []
              for item in soup.findAll('a', attrs={'target': '_blank'}):
                if item.parent.name == 'h3':
                  newArt = {}
                  newArt['title'] = item.string
                  newArt['url'] = item['href']
                  newArticles.append(newArt)          
              feeds.append((section, newArticles))
        return feeds
  4. An example of the page being downloaded can be seen here: https://longreads.com/2023/04/21/the...-the-week-462/

The links are extracted correctly, the issue is that I always get the error 'tuple' object has no attribute 'title'. The example I base it on is obviously old, but I also see several newer recipies where it works to use the append function for the feed.

Outputting the feeds array shows this (excerpt), so obviously the links are added incorrectly:

Code:
____________________
Title       : SolarWinds: The Untold Story of the Boldest Supply-Chain Hack
URL         : https://www.wired.com/story/the-untold-story-of-solarwinds-the-boldest-supply-chain-hack-ever/
Author      : Kim Zetter
Summary     : The attackers were i...
Date        : Tue, 02 May, 2023 12:00
TOC thumb   : None
Has content : False

, ('section', [{'title': '1. A Trucker’s Kidnapping, a Suspicious Ransom, and a Colorado Family’s Perilous Quest for Justice', 'url': 'https://www.5280.com/a-truckers-kidnapping-a-suspicious-ransom-and-a-colorado-familys-perilous-quest-for-justice/?src=longreads'},
I have also tried to create an array of Feed objects, when I use the append function it then complains that 'Feed' object has no attribute 'articles'.'

Grateful for any help with this, there is obviously something simple that I cannot see...
leader_montanus is offline   Reply With Quote