Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 02-18-2012, 12:29 PM   #1
rjgrigaitis
Junior Member
rjgrigaitis began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Feb 2012
Device: Sony PRS-350
First Article Repeated for Friday Fax

I'm new to Calibre and Python. I've only read up to Chapter 6 in the Python Tutorial. I'm a C++ programmer that's done almost nothing but PHP programming for the last seven years. Thus, I hardly know what I'm doing with Calibre recipes.

This is my first complex recipe:

Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class AdvancedUserRecipe1328808344(BasicNewsRecipe):
    title          = u'C-Fam Friday Fax'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = True

    def parse_index(self):
        soup = self.index_to_soup('http://www.c-fam.org/fridayfax/')
        articles = []
        feeds = []

        for div in soup.findAll('div'):
            a = div.find('a', href=True, attrs={'class':'ffArchiveLink'})
            if not a:
                continue

            url = 'http://www.c-fam.org/' + a['href']
            title = ''.join(a.findAll(text=True, recursive=False)).strip()
            i = div.find('i')
            if not i:
                pubdate = strftime('%a, %d %b')
            else:
                pubdate = ''.join(i.findAll(text=True, recursive=False)).strip()

            description = ''
            articles.append({'title' : title,
                                       'url' : url,
                                       'date' : pubdate,
                                       'description' : description})

        feeds.append((self.title, articles))

        return feeds
The first article gets repeated 3 time though. Therefore I added this code:

Code:
            def getSetURL(articles):
                ans = []
                for article in articles:
                    ans.append(article['url'])
                return ans

            url = 'http://www.c-fam.org/' + a['href']
            if url in getSetURL(articles):
                continue
I'm sure this code shouldn't be necessary, but I can't figure out how to get rid of the repeats of the first article without it. What am I doing wrong with the original code? If nothing, is the code I added the best way to get rid of the repeated articles?
rjgrigaitis is offline   Reply With Quote
Old 02-19-2012, 12:04 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use this instead:
Code:
seen_articles = {}

if a['href'] in seen_articles:
   continue
seen_articles.add(a['href'])
That probably neccessary because the index page lists the first article in three different places, presumably to emphasize it.
kovidgoyal is offline   Reply With Quote
Advert
Old 02-21-2012, 02:55 PM   #3
rjgrigaitis
Junior Member
rjgrigaitis began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Feb 2012
Device: Sony PRS-350
Quote:
Originally Posted by kovidgoyal View Post
That probably neccessary because the index page lists the first article in three different places, presumably to emphasize it.
That's what I thought; however, when I looked, it was just listed once. This is why I asked for help.
rjgrigaitis is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Repeated "Ignoring missing TOC entry" when converting PDF to MOBI goldenhair Calibre 2 01-19-2011 10:30 AM
Repeated crash after computer connection roguefan99 Kobo Reader 2 07-23-2010 11:36 PM
commande chez numilog le fax casse le charme. discusaigon E-Books 13 07-17-2010 08:40 AM
Repeated Chapter Headings in Kobo Table of Contents capsolo Sigil 5 06-20-2010 03:09 AM
ADE gives repeated instructions to update when it is already updated Seabound ePub 4 02-25-2010 12:44 AM


All times are GMT -4. The time now is 01:51 PM.


MobileRead.com is a privately owned, operated and funded community.