MobileRead Forums - View Single Post - Modified Recipe Tweakers.net

roedi06 · 01-15-2012, 08:31 AM

This is a modified Recipe for Tweakers.net including reactions to the news feeds posted by users. Multiple people requested this, while the reactions mostly contain valuable information. I'm a newbie, but I was able to construct this recipe by modifying the existing recipe. It does work, but not totally satisfactory, please read on, I hope you can help!

Code:

#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement

__license__   = 'GPL v3'
__docformat__ = 'restructuredtext en'

import re
from calibre.web.feeds.news import BasicNewsRecipe

class Tweakers(BasicNewsRecipe):
     title          = u'Tweakers.netMOD'
     __author__     = 'Roedi06'
     language       = 'nl'
     oldest_article = 7
     max_articles_per_feed = 50

     keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}),
	{'id':'reacties'},
	  ]

     remove_tags    = [dict(name='div', attrs={'id' : ['utracker']}),
                        {'class' : ['sidebar']},
                        {'class' : ['moderation']},
                        {'class' : ['filterBox']},
                        {'id' : ['toggleButtonTxt']},
                        {'class' : ['twitter-share-button']},
                        {'class' : ['textadTop']},
                        {'class' : ['commentLink']},
						{'class' : ['pageIndex']},
						{'class' : ['reactieHeader collapsed']},
                      ]


     no_stylesheets=True

     preprocess_regexps = [
     (re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'<p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'</p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     ]

     extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 1px #333333; } \
				 .reactieContent { color: #000000; font-size: 8px; }' 

	 
     feeds          = [(u'Tweakers.net', u'http://tweakers.net/feeds/nieuws.xml')]

Problem is that the page only loads a certain amount of reactions when fetching it.

After reading the forum

I tried the following;

To get around this the url has to be modified. I tried so by doing:

Code:

def print_version(self, url):
     return url + '?max=200'

Didn't work!

So I tried:

Code:

def print_version(self, url):
            return self.browser.open_novisit(url).geturl().replace('html', 'html?max=200')

Didn't work.

So I tried the get_article method:

Code:

    def get_article_url(self, article):
 return self.browser.open_novisit(url).geturl() + '?max=200'

Didn't work...

Are there any suggestions how to get around this.

I think it is a nasy server-redirect that is bugging me, and I'm not sure if there is a work-around. Strange thing though; When I load the URL myself from the browser with '?max=200' added to the URL it does work.