View Single Post
Old 01-15-2012, 08:31 AM   #1
roedi06
Junior Member
roedi06 began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
This is a modified Recipe for Tweakers.net including reactions to the news feeds posted by users. Multiple people requested this, while the reactions mostly contain valuable information. I'm a newbie, but I was able to construct this recipe by modifying the existing recipe. It does work, but not totally satisfactory, please read on, I hope you can help!

Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement

__license__   = 'GPL v3'
__docformat__ = 'restructuredtext en'

import re
from calibre.web.feeds.news import BasicNewsRecipe

class Tweakers(BasicNewsRecipe):
     title          = u'Tweakers.netMOD'
     __author__     = 'Roedi06'
     language       = 'nl'
     oldest_article = 7
     max_articles_per_feed = 50

     keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}),
	{'id':'reacties'},
	  ]

     remove_tags    = [dict(name='div', attrs={'id' : ['utracker']}),
                        {'class' : ['sidebar']},
                        {'class' : ['moderation']},
                        {'class' : ['filterBox']},
                        {'id' : ['toggleButtonTxt']},
                        {'class' : ['twitter-share-button']},
                        {'class' : ['textadTop']},
                        {'class' : ['commentLink']},
						{'class' : ['pageIndex']},
						{'class' : ['reactieHeader collapsed']},
                      ]


     no_stylesheets=True

     preprocess_regexps = [
     (re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'<p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'</p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     ]

     extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 1px #333333; } \
				 .reactieContent { color: #000000; font-size: 8px; }' 

	 
     feeds          = [(u'Tweakers.net', u'http://tweakers.net/feeds/nieuws.xml')]
Problem is that the page only loads a certain amount of reactions when fetching it. After reading the forum I tried the following;

To get around this the url has to be modified. I tried so by doing:
Code:
def print_version(self, url):
     return url + '?max=200'
Didn't work!

So I tried:
Code:
def print_version(self, url):
            return self.browser.open_novisit(url).geturl().replace('html', 'html?max=200')
Didn't work.

So I tried the get_article method:
Code:
    def get_article_url(self, article):
 return self.browser.open_novisit(url).geturl() + '?max=200'
Didn't work...

Are there any suggestions how to get around this. I think it is a nasy server-redirect that is bugging me, and I'm not sure if there is a work-around. Strange thing though; When I load the URL myself from the browser with '?max=200' added to the URL it does work.
Attached Files
File Type: zip Tweakers.netMOD_1002.zip (927 Bytes, 221 views)

Last edited by roedi06; 01-17-2012 at 05:35 AM.
roedi06 is offline   Reply With Quote