Quote:
Originally Posted by kovidgoyal
feeds is a list of Feed objects. The form (title, list of feeds) is used in parse_index() not parse_feeds().
|
Thanks for the quick response and for pointing me in the right direction, shows why it's worth supporting Calibre! This should also teach me not to program at night when I'm tired
The code that works in the end looks like this, using the built in feeds_from_index function to create feed objects:
Code:
# subclass parse_feeds and then add the links from the Long Reads HTML page to the feeds list
def parse_feeds(self):
feeds = super(LongReads, self).parse_feeds()
# Loop through existing articles until hit on the one from Long Reads website
newArticles = []
for curfeed in feeds:
for a, curarticle in enumerate(curfeed.articles):
# found the Long Reads page, extract links and summary using standard BS function
if curarticle.url and 'longreads.com' in curarticle.url:
raw = browser().open_novisit(curarticle.url).read()
soup = BeautifulSoup(raw)
for item in soup.findAll('a', attrs={'target': '_blank'}):
if item.parent.name == 'h3':
# found a link, create a new dictionary entry in basic article format and add to list
newArticles.append({
'title': item.string,
'date': date.today(),
'url': item['href'],
'description': item.parent.findNext('p').findNext('p').contents[0]
})
# If there are any links, create/append a new Feed object
if len(newArticles) > 0:
# use built in function to create feed objects from list of dictionaries with article info
newfeeds = feeds_from_index([('Long Reads', newArticles)], oldest_article=self.oldest_article,
max_articles_per_feed=self.max_articles_per_feed)
# add the new feed objects to existing feed list, needs to be done one by one
for newfeed in newfeeds:
feeds.append(newfeed)
# finally delete original page as it is just a link page
feeds.pop(feeds.index(curfeed))
return feeds
# in case Long Reads page not downloaded we have this catch-all for returning feeds
return feeds