I'm new to Calibre and Python. I've only read up to Chapter 6 in the Python Tutorial. I'm a C++ programmer that's done almost nothing but PHP programming for the last seven years. Thus, I hardly know what I'm doing with Calibre recipes.
This is my first complex recipe:
Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class AdvancedUserRecipe1328808344(BasicNewsRecipe):
title = u'C-Fam Friday Fax'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
def parse_index(self):
soup = self.index_to_soup('http://www.c-fam.org/fridayfax/')
articles = []
feeds = []
for div in soup.findAll('div'):
a = div.find('a', href=True, attrs={'class':'ffArchiveLink'})
if not a:
continue
url = 'http://www.c-fam.org/' + a['href']
title = ''.join(a.findAll(text=True, recursive=False)).strip()
i = div.find('i')
if not i:
pubdate = strftime('%a, %d %b')
else:
pubdate = ''.join(i.findAll(text=True, recursive=False)).strip()
description = ''
articles.append({'title' : title,
'url' : url,
'date' : pubdate,
'description' : description})
feeds.append((self.title, articles))
return feeds
The first article gets repeated 3 time though. Therefore I added this code:
Code:
def getSetURL(articles):
ans = []
for article in articles:
ans.append(article['url'])
return ans
url = 'http://www.c-fam.org/' + a['href']
if url in getSetURL(articles):
continue
I'm sure this code shouldn't be necessary, but I can't figure out how to get rid of the repeats of the first article without it. What am I doing wrong with the original code? If nothing, is the code I added the best way to get rid of the repeated articles?