View Single Post
Old 04-23-2017, 04:37 PM   #1
atsiong1
Junior Member
atsiong1 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2017
Device: Kindle
Trouble turning a non-RSS webpage into a "feed"

Hi, I have been trying to rework this custom recipe example for the New York Times to create a custom recipe that will pull all the articles from this webpage (not an RSS feed).

In theory it seems straightforward enough – I have identified which html elements contain the feed (ul), articles (li), article title (a), article link (a href) and author name (i). But I am new to Python and to recipes, and each of my attempts so far has resulted in a “TypeError: 'NoneType' object is not iterable.”
My attempt:

Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class Adoption(BasicNewsRecipe):

    title       = 'Transracial Adoption/Interracial Adoption'
    __author__  = 'Mrs. Magoo'
    description = 'Articles from Pact Adopt'
    timefmt = ' [%a, %d %b, %Y]'
    remove_tags_before = dict(name='li')
    remove_tags_after  = dict(name='li')


    def parse_index(self):
        soup = self.index_to_soup('http://www.pactadopt.org/resources/transracial-adoption-interracial-adoption.html')

        def feed_title(ul):
            return ''.join(ul.findAll(text=True, recursive=False)).strip()

        articles = {}
        key = None
        ans = []
        for ul in soup.findAll(True,
             attrs={'name':['li']}):

                 url = re.sub(r'\?.*', '', a['href'])
                 title = self.tag_to_string(a, use_alt=True).strip()
                 author = self.tag_to_string(i, use_alt=True).strip()
                 description = ''
                 pubdate = strftime('%Y')
                 summary = ''
I’m sure I must be misunderstanding how the elements of my webpage map to the structure of the NYT recipe.

Does anyone have any pointers? I’ve really been enjoying using Calibre to pull in RSS feeds and would love to expand my skills to non-RSS webpages as well.

Thanks!
atsiong1 is offline   Reply With Quote