|
|
#1 |
|
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Feb 2012
Device: Sony PRS-350
|
First Article Repeated for Friday Fax
I'm new to Calibre and Python. I've only read up to Chapter 6 in the Python Tutorial. I'm a C++ programmer that's done almost nothing but PHP programming for the last seven years. Thus, I hardly know what I'm doing with Calibre recipes.
This is my first complex recipe: Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class AdvancedUserRecipe1328808344(BasicNewsRecipe):
title = u'C-Fam Friday Fax'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
def parse_index(self):
soup = self.index_to_soup('http://www.c-fam.org/fridayfax/')
articles = []
feeds = []
for div in soup.findAll('div'):
a = div.find('a', href=True, attrs={'class':'ffArchiveLink'})
if not a:
continue
url = 'http://www.c-fam.org/' + a['href']
title = ''.join(a.findAll(text=True, recursive=False)).strip()
i = div.find('i')
if not i:
pubdate = strftime('%a, %d %b')
else:
pubdate = ''.join(i.findAll(text=True, recursive=False)).strip()
description = ''
articles.append({'title' : title,
'url' : url,
'date' : pubdate,
'description' : description})
feeds.append((self.title, articles))
return feeds
Code:
def getSetURL(articles):
ans = []
for article in articles:
ans.append(article['url'])
return ans
url = 'http://www.c-fam.org/' + a['href']
if url in getSetURL(articles):
continue
|
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use this instead:
Code:
seen_articles = {}
if a['href'] in seen_articles:
continue
seen_articles.add(a['href'])
|
|
|
|
|
|
#3 |
|
Junior Member
![]() Posts: 5
Karma: 10
Join Date: Feb 2012
Device: Sony PRS-350
|
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Repeated "Ignoring missing TOC entry" when converting PDF to MOBI | goldenhair | Calibre | 2 | 01-19-2011 11:30 AM |
| Repeated crash after computer connection | roguefan99 | Kobo Reader | 2 | 07-24-2010 12:36 AM |
| commande chez numilog le fax casse le charme. | discusaigon | E-Books | 13 | 07-17-2010 09:40 AM |
| Repeated Chapter Headings in Kobo Table of Contents | capsolo | Sigil | 5 | 06-20-2010 04:09 AM |
| ADE gives repeated instructions to update when it is already updated | Seabound | ePub | 4 | 02-25-2010 01:44 AM |