|
|
#1 |
|
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Apr 2017
Device: Kindle
|
Trouble turning a non-RSS webpage into a "feed"
Hi, I have been trying to rework this custom recipe example for the New York Times to create a custom recipe that will pull all the articles from this webpage (not an RSS feed).
In theory it seems straightforward enough – I have identified which html elements contain the feed (ul), articles (li), article title (a), article link (a href) and author name (i). But I am new to Python and to recipes, and each of my attempts so far has resulted in a “TypeError: 'NoneType' object is not iterable.” My attempt: Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class Adoption(BasicNewsRecipe):
title = 'Transracial Adoption/Interracial Adoption'
__author__ = 'Mrs. Magoo'
description = 'Articles from Pact Adopt'
timefmt = ' [%a, %d %b, %Y]'
remove_tags_before = dict(name='li')
remove_tags_after = dict(name='li')
def parse_index(self):
soup = self.index_to_soup('http://www.pactadopt.org/resources/transracial-adoption-interracial-adoption.html')
def feed_title(ul):
return ''.join(ul.findAll(text=True, recursive=False)).strip()
articles = {}
key = None
ans = []
for ul in soup.findAll(True,
attrs={'name':['li']}):
url = re.sub(r'\?.*', '', a['href'])
title = self.tag_to_string(a, use_alt=True).strip()
author = self.tag_to_string(i, use_alt=True).strip()
description = ''
pubdate = strftime('%Y')
summary = ''
Does anyone have any pointers? I’ve really been enjoying using Calibre to pull in RSS feeds and would love to expand my skills to non-RSS webpages as well. Thanks! |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,617
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That error means one of your findAll/find() calls is not finding anything.
Looking over your recipe quickly, I see for example, findAll(attrs={'name':'li'}) If you want to find an <li> tags you do findAll('li') |
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Junior Member
![]() Posts: 4
Karma: 10
Join Date: Apr 2017
Device: Kindle
|
Thank you!
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| There are "RSS reader" and "mail client" hacks for kobo? | Seninha | Kobo Reader | 2 | 10-01-2014 12:02 AM |
| No Author in RSS-Feed "newest" | dosser | Recipes | 0 | 09-13-2013 10:53 AM |
| New recipe for german RSS feed of "Buchreport.de" | a.peter | Recipes | 1 | 11-16-2012 08:30 AM |
| Trouble with RSS Feed | remlap | Recipes | 0 | 10-25-2012 01:46 PM |
| Recipe for german RSS feed "Leipziger Volkszeitung" | a.peter | Recipes | 0 | 09-28-2011 04:05 AM |