This is a modified Recipe for Tweakers.net including reactions to the news feeds posted by users. Multiple people requested this, while the reactions mostly contain valuable information. I'm a newbie, but I was able to construct this recipe by modifying the existing recipe. It does work, but not totally satisfactory, please read on, I hope you can help!
Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement
__license__ = 'GPL v3'
__docformat__ = 'restructuredtext en'
import re
from calibre.web.feeds.news import BasicNewsRecipe
class Tweakers(BasicNewsRecipe):
title = u'Tweakers.netMOD'
__author__ = 'Roedi06'
language = 'nl'
oldest_article = 7
max_articles_per_feed = 50
keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}),
{'id':'reacties'},
]
remove_tags = [dict(name='div', attrs={'id' : ['utracker']}),
{'class' : ['sidebar']},
{'class' : ['moderation']},
{'class' : ['filterBox']},
{'id' : ['toggleButtonTxt']},
{'class' : ['twitter-share-button']},
{'class' : ['textadTop']},
{'class' : ['commentLink']},
{'class' : ['pageIndex']},
{'class' : ['reactieHeader collapsed']},
]
no_stylesheets=True
preprocess_regexps = [
(re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'<p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'</p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
]
extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 1px #333333; } \
.reactieContent { color: #000000; font-size: 8px; }'
feeds = [(u'Tweakers.net', u'http://tweakers.net/feeds/nieuws.xml')]
Problem is that the page only loads a certain amount of reactions when fetching it.

After reading the forum

I tried the following;
To get around this the url has to be modified. I tried so by doing:
Code:
def print_version(self, url):
return url + '?max=200'
Didn't work!
So I tried:
Code:
def print_version(self, url):
return self.browser.open_novisit(url).geturl().replace('html', 'html?max=200')
Didn't work.
So I tried the get_article method:
Code:
def get_article_url(self, article):
return self.browser.open_novisit(url).geturl() + '?max=200'
Didn't work...
Are there any suggestions how to get around this.

I think it is a nasy server-redirect that is bugging me, and I'm not sure if there is a work-around. Strange thing though; When I load the URL myself from the browser with '?max=200' added to the URL it does work.