MobileRead Forums - View Single Post

BobbyQ · 10-05-2012, 09:20 AM

Hey recipe gurus !

Being a man of honour I have read through all the available instructions and examples first and then tried my best to get www.noz.de working properly, which is a german newspaper. Luckily, the basic recipe algorithm works quite nicely and gets the basic job done.

Code:

class AdvancedUserRecipe1344926684(BasicNewsRecipe):
    title          = u'Neue Osnabrücker Zeitung'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = True
    no_stylesheets         = True
    use_embedded_content   = False
    language               = 'de'
    remove_javascript      = True
    
    feeds          = [(u'Lokales', u'http://www.noz.de/rss/Lokales'),
(u'Vermischtes', u'http://www.noz.de/rss/Vermischtes'),
(u'Politik', u'http://www.noz.de/rss/Politik'),
(u'Wirtschaft', u'http://www.noz.de/rss/Wirtschaft'),
(u'Kultur', u'http://www.noz.de/rss/Kultur'), 
(u'Medien', u'http://www.noz.de/rss/Medien'),
(u'Wissenschaft', u'http://www.noz.de/rss/wissenschaft'),
(u'Sport', u'http://www.noz.de/rss/Sport'),
(u'Computer', u'http://www.noz.de/rss/Computer'),
(u'Musik', u'http://www.noz.de/rss/Musik'),
(u'Szene', u'http://www.noz.de/rss/Szene'),
(u'Niedersachsen', u'http://www.noz.de/rss/Niedersachsen'),
(u'Kino', u'http://www.noz.de/rss/Kino')]

But then I noticed that the algorithm has problems with some of the categories, e.g. http://www.noz.de/rss/Computer. Somehow it does not recognize the main article text and picture but outputs some other article previews on the same page (I have to admit that the pages are not very rss-friendly indeed).
So I used some of the tricks to filter the bad parts out and keep the good ones, but I couldn't get anything running better than the basic recipe. If only i could have figured out how to create some helpful debug output ...
Most of my tried resulted in empty ebooks and I couldn't locate my mistakes.

Who would like to help me cooking this recipe ?

10-05-2012, 09:20 AM	#1
BobbyQ Member Posts: 10 Karma: 10 Join Date: Oct 2012 Device: Kindle 4	Request : www.noz.de (german newspaper) Hey recipe gurus ! Being a man of honour I have read through all the available instructions and examples first and then tried my best to get www.noz.de working properly, which is a german newspaper. Luckily, the basic recipe algorithm works quite nicely and gets the basic job done. Code: class AdvancedUserRecipe1344926684(BasicNewsRecipe): title = u'Neue Osnabrücker Zeitung' oldest_article = 7 max_articles_per_feed = 100 auto_cleanup = True no_stylesheets = True use_embedded_content = False language = 'de' remove_javascript = True feeds = [(u'Lokales', u'http://www.noz.de/rss/Lokales'), (u'Vermischtes', u'http://www.noz.de/rss/Vermischtes'), (u'Politik', u'http://www.noz.de/rss/Politik'), (u'Wirtschaft', u'http://www.noz.de/rss/Wirtschaft'), (u'Kultur', u'http://www.noz.de/rss/Kultur'), (u'Medien', u'http://www.noz.de/rss/Medien'), (u'Wissenschaft', u'http://www.noz.de/rss/wissenschaft'), (u'Sport', u'http://www.noz.de/rss/Sport'), (u'Computer', u'http://www.noz.de/rss/Computer'), (u'Musik', u'http://www.noz.de/rss/Musik'), (u'Szene', u'http://www.noz.de/rss/Szene'), (u'Niedersachsen', u'http://www.noz.de/rss/Niedersachsen'), (u'Kino', u'http://www.noz.de/rss/Kino')] But then I noticed that the algorithm has problems with some of the categories, e.g. http://www.noz.de/rss/Computer. Somehow it does not recognize the main article text and picture but outputs some other article previews on the same page (I have to admit that the pages are not very rss-friendly indeed). So I used some of the tricks to filter the bad parts out and keep the good ones, but I couldn't get anything running better than the basic recipe. If only i could have figured out how to create some helpful debug output ... Most of my tried resulted in empty ebooks and I couldn't locate my mistakes. Who would like to help me cooking this recipe ?