MobileRead Forums - View Single Post

leader_montanus · 05-06-2023, 04:51 PM

Quote:

Originally Posted by kovidgoyal

feeds is a list of Feed objects. The form (title, list of feeds) is used in parse_index() not parse_feeds().

Thanks for the quick response and for pointing me in the right direction, shows why it's worth supporting Calibre! This should also teach me not to program at night when I'm tired

The code that works in the end looks like this, using the built in feeds_from_index function to create feed objects:

Code:

# subclass parse_feeds and then add the links from the Long Reads HTML page to the feeds list
  def parse_feeds(self):
    feeds = super(LongReads, self).parse_feeds()

# Loop through existing articles until hit on the one from Long Reads website
    newArticles = []
    for curfeed in feeds:
      for a, curarticle in enumerate(curfeed.articles):
        
# found the Long Reads page, extract links and summary using standard BS function
        if curarticle.url and 'longreads.com' in curarticle.url:
          raw = browser().open_novisit(curarticle.url).read()
          soup = BeautifulSoup(raw)
          for item in soup.findAll('a', attrs={'target': '_blank'}):
            if item.parent.name == 'h3':
# found a link, create a new dictionary entry in basic article format and add to list
              newArticles.append({
                                  'title': item.string,
                                  'date': date.today(),
                                  'url': item['href'],
                                  'description': item.parent.findNext('p').findNext('p').contents[0]
                                  })

# If there are any links, create/append a new Feed object
          if len(newArticles) > 0:

# use built in function to create feed objects from list of dictionaries with article info
            newfeeds = feeds_from_index([('Long Reads', newArticles)], oldest_article=self.oldest_article,
                                      max_articles_per_feed=self.max_articles_per_feed)

# add the new feed objects to existing feed list, needs to be done one by one
            for newfeed in newfeeds: 
              feeds.append(newfeed)

# finally delete original page as it is just a link page
          feeds.pop(feeds.index(curfeed))
          return feeds

# in case Long Reads page not downloaded we have this catch-all for returning feeds
    return feeds