Hey recipe gurus !
Being a man of honour I have read through all the available instructions and examples first and then tried my best to get
www.noz.de working properly, which is a german newspaper. Luckily, the basic recipe algorithm works quite nicely and gets the basic job done.
Code:
class AdvancedUserRecipe1344926684(BasicNewsRecipe):
title = u'Neue Osnabrücker Zeitung'
oldest_article = 7
max_articles_per_feed = 100
auto_cleanup = True
no_stylesheets = True
use_embedded_content = False
language = 'de'
remove_javascript = True
feeds = [(u'Lokales', u'http://www.noz.de/rss/Lokales'),
(u'Vermischtes', u'http://www.noz.de/rss/Vermischtes'),
(u'Politik', u'http://www.noz.de/rss/Politik'),
(u'Wirtschaft', u'http://www.noz.de/rss/Wirtschaft'),
(u'Kultur', u'http://www.noz.de/rss/Kultur'),
(u'Medien', u'http://www.noz.de/rss/Medien'),
(u'Wissenschaft', u'http://www.noz.de/rss/wissenschaft'),
(u'Sport', u'http://www.noz.de/rss/Sport'),
(u'Computer', u'http://www.noz.de/rss/Computer'),
(u'Musik', u'http://www.noz.de/rss/Musik'),
(u'Szene', u'http://www.noz.de/rss/Szene'),
(u'Niedersachsen', u'http://www.noz.de/rss/Niedersachsen'),
(u'Kino', u'http://www.noz.de/rss/Kino')]
But then I noticed that the algorithm has problems with some of the categories, e.g.
http://www.noz.de/rss/Computer. Somehow it does not recognize the main article text and picture but outputs some other article previews on the same page (I have to admit that the pages are not very rss-friendly indeed).
So I used some of the tricks to filter the bad parts out and keep the good ones, but I couldn't get anything running better than the basic recipe. If only i could have figured out how to create some helpful debug output ...
Most of my tried resulted in empty ebooks and I couldn't locate my mistakes.
Who would like to help me cooking this recipe ?