|
|
#1 |
|
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
|
This is a modified Recipe for Tweakers.net including reactions to the news feeds posted by users. Multiple people requested this, while the reactions mostly contain valuable information. I'm a newbie, but I was able to construct this recipe by modifying the existing recipe. It does work, but not totally satisfactory, please read on, I hope you can help!
![]() Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement
__license__ = 'GPL v3'
__docformat__ = 'restructuredtext en'
import re
from calibre.web.feeds.news import BasicNewsRecipe
class Tweakers(BasicNewsRecipe):
title = u'Tweakers.netMOD'
__author__ = 'Roedi06'
language = 'nl'
oldest_article = 7
max_articles_per_feed = 50
keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}),
{'id':'reacties'},
]
remove_tags = [dict(name='div', attrs={'id' : ['utracker']}),
{'class' : ['sidebar']},
{'class' : ['moderation']},
{'class' : ['filterBox']},
{'id' : ['toggleButtonTxt']},
{'class' : ['twitter-share-button']},
{'class' : ['textadTop']},
{'class' : ['commentLink']},
{'class' : ['pageIndex']},
{'class' : ['reactieHeader collapsed']},
]
no_stylesheets=True
preprocess_regexps = [
(re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'<p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'</p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
]
extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 1px #333333; } \
.reactieContent { color: #000000; font-size: 8px; }'
feeds = [(u'Tweakers.net', u'http://tweakers.net/feeds/nieuws.xml')]
After reading the forum I tried the following;To get around this the url has to be modified. I tried so by doing: Code:
def print_version(self, url):
return url + '?max=200'
![]() So I tried: Code:
def print_version(self, url):
return self.browser.open_novisit(url).geturl().replace('html', 'html?max=200')
![]() So I tried the get_article method: Code:
def get_article_url(self, article): return self.browser.open_novisit(url).geturl() + '?max=200' Are there any suggestions how to get around this. I think it is a nasy server-redirect that is bugging me, and I'm not sure if there is a work-around. Strange thing though; When I load the URL myself from the browser with '?max=200' added to the URL it does work.
Last edited by roedi06; 01-17-2012 at 06:35 AM. |
|
|
|
|
|
#2 |
|
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
|
As mentioned I'm a newbie. How can I check whether those 'get_article' or 'print_version' methods are actually being called.. I'm working under windows, so I don't have a command prompt to work with 'print'.
I would really appreciate your help! |
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
|
Fixed it! I followed the redirect of the rss-feed by simply opening it in my browser and see where it took me. That link I added to my recipe. On that link the 'print_version' method does work! Now I'll continue working on the style.
This is the code sofar: Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement
__license__ = 'GPL v3'
__docformat__ = 'restructuredtext en'
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
class Tweakers(BasicNewsRecipe):
title = u'Tweakers.netMOD2'
__author__ = 'Roedi06'
language = 'nl'
oldest_article = 7
max_articles_per_feed = 3
keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}),
{'id':'reacties'},
]
remove_tags = [dict(name='div', attrs={'id' : ['utracker']}),
{'class' : ['sidebar']},
{'class' : ['moderation']},
{'class' : ['filterBox']},
{'id' : ['toggleButtonTxt']},
{'class' : ['twitter-share-button']},
{'class' : ['textadTop']},
{'class' : ['commentLink']},
{'class' : ['pageIndex']},
{'class' : ['reactieHeader collapsed']},
]
no_stylesheets=True
preprocess_regexps = [
(re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'<p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'</p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
]
extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 1px #333333; } \
.reactieContent { color: #000000; font-size: 8px; }'
feeds = [(u'Tweakers.net', u'http://feeds.feedburner.com/tweakers/nieuws')]
def print_version(self, url):
return url + '?max=200'
|
|
|
|
|
|
#4 |
|
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
|
Completed!
I successfully completed the Recipe for 'Tweakers.net - Including reactions'
For details see code below or have a look at the attached file. Kovid can you have look at it and maybe include it in a next release?? Me and more people in the tweakers.net community would be thankfull!Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement
__license__ = 'GPL v3'
__docformat__ = 'restructuredtext en'
import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile
class Tweakers(BasicNewsRecipe):
title = u'Tweakers.net - with Reactions'
__author__ = 'Roedi06'
language = 'nl'
oldest_article = 7
max_articles_per_feed = 100
cover_url = 'http://img51.imageshack.us/img51/7470/tweakersnetebook.gif'
keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}),
{'id':'reacties'},
]
remove_tags = [dict(name='div', attrs={'id' : ['utracker']}),
{'id' : ['channelNav']},
{'id' : ['contentArea']},
{'class' : ['breadCrumb']},
{'class' : ['nextPrevious ellipsis']},
{'class' : ['advertorial']},
{'class' : ['sidebar']},
{'class' : ['filterBox']},
{'id' : ['toggleButtonTxt']},
{'id' : ['socialButtons']},
{'class' : ['button']},
{'class' : ['textadTop']},
{'class' : ['commentLink']},
{'title' : ['Reageer op deze reactie']},
{'class' : ['pageIndex']},
{'class' : ['reactieHeader collapsed']},
]
no_stylesheets=True
preprocess_regexps = [
(re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'<p>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'</p>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'<a.*?>'), lambda h1: '<b><u>'),
(re.compile(r'</a>'), lambda h2: '</u></b>'),
(re.compile(r'<span class="new">', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'</span>', re.IGNORECASE | re.DOTALL), lambda match : ''),
(re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_0'), lambda match : ' - moderated 0<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_0'),
(re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_1'), lambda match : ' - moderated +1<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_1'),
(re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_2'), lambda match : ' - moderated +2<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_2'),
(re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_3'), lambda match : ' - moderated +3<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_3'),
(re.compile(r'<div class="moderation">.*?</div>'), lambda h1: ''),
]
extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 2px #333333; border-top:solid 1px #333333; } \
.reactieContent { font-family:"Times New Roman",Georgia,Serif; color: #000000; font-size: 8px; } \
.quote { font-family:"Times New Roman",Georgia,Serif; padding-left:2px; border-left:solid 3px #666666; color: #666666; }'
feeds = [(u'Tweakers.net', u'http://feeds.feedburner.com/tweakers/nieuws')]
def print_version(self, url):
return url + '?max=200'
|
|
|
|
|
|
#5 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,618
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
done .
|
|
|
|
| Advert | |
|
|
![]() |
| Thread Tools | Search this Thread |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| LWN.net Weekly News recipe | davide125 | Recipes | 22 | 11-12-2014 10:44 PM |
| Modified Reuters News Recipe Submission | rogerx | Recipes | 1 | 08-25-2011 11:19 PM |
| recipe for FAZ.net - german | schuster | Recipes | 10 | 05-28-2011 01:13 AM |
| Modified Irish Times Recipe | phiznlil | Recipes | 2 | 04-01-2011 07:27 AM |
| Request: Inquirer.net Recipe update | zoilom | Recipes | 0 | 12-21-2010 02:06 AM |