|
|
#1 |
|
Junior Member
![]() Posts: 9
Karma: 10
Join Date: May 2023
Device: Onyx Boox Nova Air
|
Appending articles to a feed fails
Hi,
I've been using Calibre for a few years and have also used the recipe function to download news every day. Most of the recipes I use are slightly modified and based on RSS feeds. I'm currently stumped trying to add a few articles from a HTML page to a previously populated feed. I have looked at the example here of how to add articles and I have also looked at several existing recipes. What I am doing is as follows:
The links are extracted correctly, the issue is that I always get the error 'tuple' object has no attribute 'title'. The example I base it on is obviously old, but I also see several newer recipies where it works to use the append function for the feed. Outputting the feeds array shows this (excerpt), so obviously the links are added incorrectly: Code:
____________________
Title : SolarWinds: The Untold Story of the Boldest Supply-Chain Hack
URL : https://www.wired.com/story/the-untold-story-of-solarwinds-the-boldest-supply-chain-hack-ever/
Author : Kim Zetter
Summary : The attackers were i...
Date : Tue, 02 May, 2023 12:00
TOC thumb : None
Has content : False
, ('section', [{'title': '1. A Trucker’s Kidnapping, a Suspicious Ransom, and a Colorado Family’s Perilous Quest for Justice', 'url': 'https://www.5280.com/a-truckers-kidnapping-a-suspicious-ransom-and-a-colorado-familys-perilous-quest-for-justice/?src=longreads'},
Grateful for any help with this, there is obviously something simple that I cannot see... |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
feeds is a list of Feed objects. The form (title, list of feeds) is used in parse_index() not parse_feeds().
|
|
|
|
| Advert | |
|
|
|
|
#3 | |
|
Junior Member
![]() Posts: 9
Karma: 10
Join Date: May 2023
Device: Onyx Boox Nova Air
|
Quote:
![]() The code that works in the end looks like this, using the built in feeds_from_index function to create feed objects: Code:
# subclass parse_feeds and then add the links from the Long Reads HTML page to the feeds list
def parse_feeds(self):
feeds = super(LongReads, self).parse_feeds()
# Loop through existing articles until hit on the one from Long Reads website
newArticles = []
for curfeed in feeds:
for a, curarticle in enumerate(curfeed.articles):
# found the Long Reads page, extract links and summary using standard BS function
if curarticle.url and 'longreads.com' in curarticle.url:
raw = browser().open_novisit(curarticle.url).read()
soup = BeautifulSoup(raw)
for item in soup.findAll('a', attrs={'target': '_blank'}):
if item.parent.name == 'h3':
# found a link, create a new dictionary entry in basic article format and add to list
newArticles.append({
'title': item.string,
'date': date.today(),
'url': item['href'],
'description': item.parent.findNext('p').findNext('p').contents[0]
})
# If there are any links, create/append a new Feed object
if len(newArticles) > 0:
# use built in function to create feed objects from list of dictionaries with article info
newfeeds = feeds_from_index([('Long Reads', newArticles)], oldest_article=self.oldest_article,
max_articles_per_feed=self.max_articles_per_feed)
# add the new feed objects to existing feed list, needs to be done one by one
for newfeed in newfeeds:
feeds.append(newfeed)
# finally delete original page as it is just a link page
feeds.pop(feeds.index(curfeed))
return feeds
# in case Long Reads page not downloaded we have this catch-all for returning feeds
return feeds
|
|
|
|
|
![]() |
| Tags |
| feed, parse |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Appending URLs in an RSS feed | Phoebus | Recipes | 2 | 08-10-2019 04:16 PM |
| Feed is titled "all articles" if only one list of articles is found | sup | Recipes | 0 | 11-30-2013 06:31 PM |
| Articles repeated in different feed sections | scissors | Recipes | 8 | 10-19-2012 12:27 PM |
| The Age Feed - repeat articles | Quasii | Recipes | 2 | 03-09-2011 07:38 PM |
| Sorting articles of RSS feed | miwie | Recipes | 1 | 11-21-2010 02:02 AM |