Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 04-23-2017, 04:37 PM   #1
atsiong1
Junior Member
atsiong1 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2017
Device: Kindle
Trouble turning a non-RSS webpage into a "feed"

Hi, I have been trying to rework this custom recipe example for the New York Times to create a custom recipe that will pull all the articles from this webpage (not an RSS feed).

In theory it seems straightforward enough – I have identified which html elements contain the feed (ul), articles (li), article title (a), article link (a href) and author name (i). But I am new to Python and to recipes, and each of my attempts so far has resulted in a “TypeError: 'NoneType' object is not iterable.”
My attempt:

Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class Adoption(BasicNewsRecipe):

    title       = 'Transracial Adoption/Interracial Adoption'
    __author__  = 'Mrs. Magoo'
    description = 'Articles from Pact Adopt'
    timefmt = ' [%a, %d %b, %Y]'
    remove_tags_before = dict(name='li')
    remove_tags_after  = dict(name='li')


    def parse_index(self):
        soup = self.index_to_soup('http://www.pactadopt.org/resources/transracial-adoption-interracial-adoption.html')

        def feed_title(ul):
            return ''.join(ul.findAll(text=True, recursive=False)).strip()

        articles = {}
        key = None
        ans = []
        for ul in soup.findAll(True,
             attrs={'name':['li']}):

                 url = re.sub(r'\?.*', '', a['href'])
                 title = self.tag_to_string(a, use_alt=True).strip()
                 author = self.tag_to_string(i, use_alt=True).strip()
                 description = ''
                 pubdate = strftime('%Y')
                 summary = ''
I’m sure I must be misunderstanding how the elements of my webpage map to the structure of the NYT recipe.

Does anyone have any pointers? I’ve really been enjoying using Calibre to pull in RSS feeds and would love to expand my skills to non-RSS webpages as well.

Thanks!
atsiong1 is offline   Reply With Quote
Old 04-23-2017, 10:37 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,857
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
That error means one of your findAll/find() calls is not finding anything.

Looking over your recipe quickly, I see for example, findAll(attrs={'name':'li'})

If you want to find an <li> tags you do

findAll('li')
kovidgoyal is offline   Reply With Quote
Advert
Old 04-25-2017, 07:30 PM   #3
atsiong1
Junior Member
atsiong1 began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Apr 2017
Device: Kindle
Thank you!
atsiong1 is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
There are "RSS reader" and "mail client" hacks for kobo? Seninha Kobo Reader 2 09-30-2014 11:02 PM
No Author in RSS-Feed "newest" dosser Recipes 0 09-13-2013 09:53 AM
New recipe for german RSS feed of "Buchreport.de" a.peter Recipes 1 11-16-2012 07:30 AM
Trouble with RSS Feed remlap Recipes 0 10-25-2012 12:46 PM
Recipe for german RSS feed "Leipziger Volkszeitung" a.peter Recipes 0 09-28-2011 03:05 AM


All times are GMT -4. The time now is 12:35 AM.


MobileRead.com is a privately owned, operated and funded community.