View Single Post
Old 06-03-2011, 08:47 AM   #7
bowbow
Member
bowbow began at the beginning.
 
Posts: 10
Karma: 10
Join Date: Jun 2011
Device: kindle 3
Hey,

that was the decisive tipp, now the recipe works. Could submit that one to the repository. Just one question left, find it below the code.

Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup

class AdvancedUserRecipe1307036487(BasicNewsRecipe):
    title          = u'ChangeX Subscription'
    oldest_article = 7
    max_articles_per_feed = 100
    needs_subscription = True

    cover_url = 'https://7012901881146393470-a-1802744773732722657-s-sites.googlegroups.com/site/banglabeltze/Home/changex.png?attachauth=ANoY7coFJ1S94rp0tfSsNy40Vkvjz8v2yvVH6ivi5d_wHHwGKbwT9x3wTDGE-SNvpHN9dCG7oC6vEvGFZz7Z75qO5Ho_iXE2_Fr7jqzCBP8kmfRwmGkUlGJMCnQKO52m3u12QHbzEaydSpELKDDc_tKHnOj6OZ-ZRCLuiJYUBM4xYVX43sIh9hvp9mGrlvzPc6mWOYPQAOhmu1p28mLRDOASkEUG9ZZc0w%3D%3D&attredirects=1'


    remove_tags = [
dict(name='div', attrs={'class':['right','optionbox','center']}),
dict(name='div', attrs={'id':['header','footer']}),
dict(name='a', attrs={'class':['top']}),
]

# entfernen aller hotlinks
    def preprocess_html(self, soup):
        for alink in soup.findAll('a'):
            if alink.string is not None:
               tstr = alink.string
               alink.replaceWith(tstr)
        return soup

    def get_browser(self):
        br = BasicNewsRecipe.get_browser()
        br.open('http://www.changex.de/')
        if self.username is not None and self.password is not None:
            br.open('http://www.changex.de/Login')
            br.select_form(nr=1)
            br['username']   = self.username
#            br['nutzername']   = self.username
            br['password'] = self.password
#            br['passwort'] = self.password
            br.submit()
        return br

    feeds          = [(u'Arbeit und Leben', u'http://www.changex.de/Feed/ArbeitUndLeben/RSS20'), (u'Wirtschaft und Management', u'http://www.changex.de/Feed/WirtschaftUndManagement/RSS20'), (u'Wissen und Lernen', u'http://www.changex.de/Feed/WissenUndLernen/RSS20')]
Just one question left:
- I wanted to create one archive of past articles first, so I set the time as follows:
Code:
    oldest_article = 900
    max_articles_per_feed = 1000
When I open the RSS (e.g. http://www.changex.de/Feed/ArbeitUndLeben/RSS20)
with Google Reader I get a bunch of articles, reaching back to 01.01.2010. However, Calibre just processes data back to Nov 23 2010, leaving aside even feeds from that very same day. That seems very odd to me.

I know recipes are meant only for frequent downloading.
Do you nonetheless have an idea how to correctly get all articles from that feed?

Thanks so much for your support!
bowbow is offline   Reply With Quote