Hey,
that was the decisive tipp, now the recipe works. Could submit that one to the repository. Just one question left, find it below the code.
Code:
import string, re
from calibre import strftime
from calibre.web.feeds.recipes import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup
class AdvancedUserRecipe1307036487(BasicNewsRecipe):
title = u'ChangeX Subscription'
oldest_article = 7
max_articles_per_feed = 100
needs_subscription = True
cover_url = 'https://7012901881146393470-a-1802744773732722657-s-sites.googlegroups.com/site/banglabeltze/Home/changex.png?attachauth=ANoY7coFJ1S94rp0tfSsNy40Vkvjz8v2yvVH6ivi5d_wHHwGKbwT9x3wTDGE-SNvpHN9dCG7oC6vEvGFZz7Z75qO5Ho_iXE2_Fr7jqzCBP8kmfRwmGkUlGJMCnQKO52m3u12QHbzEaydSpELKDDc_tKHnOj6OZ-ZRCLuiJYUBM4xYVX43sIh9hvp9mGrlvzPc6mWOYPQAOhmu1p28mLRDOASkEUG9ZZc0w%3D%3D&attredirects=1'
remove_tags = [
dict(name='div', attrs={'class':['right','optionbox','center']}),
dict(name='div', attrs={'id':['header','footer']}),
dict(name='a', attrs={'class':['top']}),
]
# entfernen aller hotlinks
def preprocess_html(self, soup):
for alink in soup.findAll('a'):
if alink.string is not None:
tstr = alink.string
alink.replaceWith(tstr)
return soup
def get_browser(self):
br = BasicNewsRecipe.get_browser()
br.open('http://www.changex.de/')
if self.username is not None and self.password is not None:
br.open('http://www.changex.de/Login')
br.select_form(nr=1)
br['username'] = self.username
# br['nutzername'] = self.username
br['password'] = self.password
# br['passwort'] = self.password
br.submit()
return br
feeds = [(u'Arbeit und Leben', u'http://www.changex.de/Feed/ArbeitUndLeben/RSS20'), (u'Wirtschaft und Management', u'http://www.changex.de/Feed/WirtschaftUndManagement/RSS20'), (u'Wissen und Lernen', u'http://www.changex.de/Feed/WissenUndLernen/RSS20')]
Just one question left:
- I wanted to create one archive of past articles first, so I set the time as follows:
Code:
oldest_article = 900
max_articles_per_feed = 1000
When I open the RSS (e.g.
http://www.changex.de/Feed/ArbeitUndLeben/RSS20)
with Google Reader I get a bunch of articles, reaching back to 01.01.2010. However, Calibre just processes data back to Nov 23 2010, leaving aside even feeds from that very same day. That seems very odd to me.
I know recipes are meant only for frequent downloading.
Do you nonetheless have an idea how to correctly get all articles from that feed?
Thanks so much for your support!