04-23-2017, 04:37 PM | #1 |
Junior Member
Posts: 4
Karma: 10
Join Date: Apr 2017
Device: Kindle
|
Trouble turning a non-RSS webpage into a "feed"
Hi, I have been trying to rework this custom recipe example for the New York Times to create a custom recipe that will pull all the articles from this webpage (not an RSS feed).
In theory it seems straightforward enough – I have identified which html elements contain the feed (ul), articles (li), article title (a), article link (a href) and author name (i). But I am new to Python and to recipes, and each of my attempts so far has resulted in a “TypeError: 'NoneType' object is not iterable.” My attempt: Code:
import string, re from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup class Adoption(BasicNewsRecipe): title = 'Transracial Adoption/Interracial Adoption' __author__ = 'Mrs. Magoo' description = 'Articles from Pact Adopt' timefmt = ' [%a, %d %b, %Y]' remove_tags_before = dict(name='li') remove_tags_after = dict(name='li') def parse_index(self): soup = self.index_to_soup('http://www.pactadopt.org/resources/transracial-adoption-interracial-adoption.html') def feed_title(ul): return ''.join(ul.findAll(text=True, recursive=False)).strip() articles = {} key = None ans = [] for ul in soup.findAll(True, attrs={'name':['li']}): url = re.sub(r'\?.*', '', a['href']) title = self.tag_to_string(a, use_alt=True).strip() author = self.tag_to_string(i, use_alt=True).strip() description = '' pubdate = strftime('%Y') summary = '' Does anyone have any pointers? I’ve really been enjoying using Calibre to pull in RSS feeds and would love to expand my skills to non-RSS webpages as well. Thanks! |
04-23-2017, 10:37 PM | #2 |
creator of calibre
Posts: 43,857
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That error means one of your findAll/find() calls is not finding anything.
Looking over your recipe quickly, I see for example, findAll(attrs={'name':'li'}) If you want to find an <li> tags you do findAll('li') |
Advert | |
|
04-25-2017, 07:30 PM | #3 |
Junior Member
Posts: 4
Karma: 10
Join Date: Apr 2017
Device: Kindle
|
Thank you!
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
There are "RSS reader" and "mail client" hacks for kobo? | Seninha | Kobo Reader | 2 | 09-30-2014 11:02 PM |
No Author in RSS-Feed "newest" | dosser | Recipes | 0 | 09-13-2013 09:53 AM |
New recipe for german RSS feed of "Buchreport.de" | a.peter | Recipes | 1 | 11-16-2012 07:30 AM |
Trouble with RSS Feed | remlap | Recipes | 0 | 10-25-2012 12:46 PM |
Recipe for german RSS feed "Leipziger Volkszeitung" | a.peter | Recipes | 0 | 09-28-2011 03:05 AM |