02-18-2012, 12:29 PM | #1 |
Junior Member
Posts: 5
Karma: 10
Join Date: Feb 2012
Device: Sony PRS-350
|
First Article Repeated for Friday Fax
I'm new to Calibre and Python. I've only read up to Chapter 6 in the Python Tutorial. I'm a C++ programmer that's done almost nothing but PHP programming for the last seven years. Thus, I hardly know what I'm doing with Calibre recipes.
This is my first complex recipe: Code:
import string, re from calibre import strftime from calibre.web.feeds.recipes import BasicNewsRecipe from calibre.ebooks.BeautifulSoup import BeautifulSoup class AdvancedUserRecipe1328808344(BasicNewsRecipe): title = u'C-Fam Friday Fax' oldest_article = 7 max_articles_per_feed = 100 auto_cleanup = True def parse_index(self): soup = self.index_to_soup('http://www.c-fam.org/fridayfax/') articles = [] feeds = [] for div in soup.findAll('div'): a = div.find('a', href=True, attrs={'class':'ffArchiveLink'}) if not a: continue url = 'http://www.c-fam.org/' + a['href'] title = ''.join(a.findAll(text=True, recursive=False)).strip() i = div.find('i') if not i: pubdate = strftime('%a, %d %b') else: pubdate = ''.join(i.findAll(text=True, recursive=False)).strip() description = '' articles.append({'title' : title, 'url' : url, 'date' : pubdate, 'description' : description}) feeds.append((self.title, articles)) return feeds Code:
def getSetURL(articles): ans = [] for article in articles: ans.append(article['url']) return ans url = 'http://www.c-fam.org/' + a['href'] if url in getSetURL(articles): continue |
02-19-2012, 12:04 AM | #2 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use this instead:
Code:
seen_articles = {} if a['href'] in seen_articles: continue seen_articles.add(a['href']) |
Advert | |
|
02-21-2012, 02:55 PM | #3 |
Junior Member
Posts: 5
Karma: 10
Join Date: Feb 2012
Device: Sony PRS-350
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Repeated "Ignoring missing TOC entry" when converting PDF to MOBI | goldenhair | Calibre | 2 | 01-19-2011 10:30 AM |
Repeated crash after computer connection | roguefan99 | Kobo Reader | 2 | 07-23-2010 11:36 PM |
commande chez numilog le fax casse le charme. | discusaigon | E-Books | 13 | 07-17-2010 08:40 AM |
Repeated Chapter Headings in Kobo Table of Contents | capsolo | Sigil | 5 | 06-20-2010 03:09 AM |
ADE gives repeated instructions to update when it is already updated | Seabound | ePub | 4 | 02-25-2010 12:44 AM |