![]() |
#1 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Nov 2012
Device: kindle fire
|
Seeking help with simple recipe for seedmagazine.com
Hi All,
This is my first recipe and first python code. That may explain any possibly stupid questions. I'm trying to emulate existing recipes to get articles from a site that has no RSS feed. In this case, http://www.seedmagazine.com. I've looked at their source HTML and, so far as I understand it, to parse the index I want every link on the page that goes to an article. That means a URL that starts http://seedmagazine.com/content/article/... (Actually, I want to get the print version of those articles, which is a pretty easy substitution. I'm attaching my current recipe. It almost works, but instead of getting all the article links on the main page, it gets only the first two. I can't seem to figure out why. Shouldn't soup.findAll('a') return all the anchor tags on the page? I'd appreciate any advice to get past that problem. And any advice in general because I really don't know how to put the finishing touches on this recipe. Thanks! -Dave Code:
import string, re from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup class seedmagazine(BasicNewsRecipe): title = u'Seed Magazine' description = u'seedmagazine.com' oldest_article = 31 max_articles_per_feed = 5 # keep this number small until recipe works def parse_index(self): articles = [] feeds = [] seen = set([]) soup = self.index_to_soup('http://www.seedmagazine.com') for link in soup.findAll('a'): url = link['href'] title = self.tag_to_string(link) if (title and url.find('/content/article/') > 0) : articles.append({'title': title, 'url': self.print_version(url), }) if (articles): feeds.append((self.title, articles)) return feeds def print_version(self, url): return url.replace('/article/', '/print/') |
![]() |
![]() |
![]() |
#2 |
Vox calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
As far as i can see your code works. whatr command did you use to compile it? Use
ebook-convert see.recipe .epub --debug-pipeline p -vv If the command you used also had the word "test" in it only 2 articles will show up. I have attached the epub produced by running your code with the above command. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 2
Karma: 10
Join Date: Nov 2012
Device: kindle fire
|
Oh thanks for that. I was running the command with --test (I had copied it that way). I removed the --test and now I'm on my way again.
|
![]() |
![]() |
![]() |
#4 |
Vox calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 412
Karma: 1175230
Join Date: Jan 2009
Device: Sony reader prs700, kobo
|
congratulations on your first python recipe. may you have many more
![]() |
![]() |
![]() |
![]() |
#5 |
Member
![]() Posts: 13
Karma: 10
Join Date: Apr 2013
Device: Kobo Aura HD
|
Hey, did you finish this? I was looking for a recipe for Seed myself.
|
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Need a simple tweak to a recipe | rij73 | Recipes | 6 | 05-24-2012 12:56 AM |
Trio of Picture Books - Simple Animals, Simple Shapes, and You're My Baby! | Manley Peterson | Self-Promotions by Authors and Publishers | 5 | 01-06-2012 08:55 PM |
Simple download from rss url recipe | BloodOmen | Recipes | 0 | 02-16-2011 09:21 PM |
Simple Recipe Breaks in Latest Version | Tegan | Recipes | 6 | 02-14-2011 10:48 AM |
erm, simple question , hope for simple answer! | al zymers | Amazon Kindle | 5 | 09-25-2010 01:01 PM |