![]() |
#1 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
|
This is a modified Recipe for Tweakers.net including reactions to the news feeds posted by users. Multiple people requested this, while the reactions mostly contain valuable information. I'm a newbie, but I was able to construct this recipe by modifying the existing recipe. It does work, but not totally satisfactory, please read on, I hope you can help!
![]() Code:
#!/usr/bin/env python # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai from __future__ import with_statement __license__ = 'GPL v3' __docformat__ = 'restructuredtext en' import re from calibre.web.feeds.news import BasicNewsRecipe class Tweakers(BasicNewsRecipe): title = u'Tweakers.netMOD' __author__ = 'Roedi06' language = 'nl' oldest_article = 7 max_articles_per_feed = 50 keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}), {'id':'reacties'}, ] remove_tags = [dict(name='div', attrs={'id' : ['utracker']}), {'class' : ['sidebar']}, {'class' : ['moderation']}, {'class' : ['filterBox']}, {'id' : ['toggleButtonTxt']}, {'class' : ['twitter-share-button']}, {'class' : ['textadTop']}, {'class' : ['commentLink']}, {'class' : ['pageIndex']}, {'class' : ['reactieHeader collapsed']}, ] no_stylesheets=True preprocess_regexps = [ (re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'<p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'</p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''), ] extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 1px #333333; } \ .reactieContent { color: #000000; font-size: 8px; }' feeds = [(u'Tweakers.net', u'http://tweakers.net/feeds/nieuws.xml')] ![]() ![]() To get around this the url has to be modified. I tried so by doing: Code:
def print_version(self, url): return url + '?max=200' ![]() So I tried: Code:
def print_version(self, url): return self.browser.open_novisit(url).geturl().replace('html', 'html?max=200') ![]() So I tried the get_article method: Code:
def get_article_url(self, article): return self.browser.open_novisit(url).geturl() + '?max=200' ![]() Are there any suggestions how to get around this. ![]() Last edited by roedi06; 01-17-2012 at 05:35 AM. |
![]() |
![]() |
![]() |
#2 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
|
As mentioned I'm a newbie. How can I check whether those 'get_article' or 'print_version' methods are actually being called.. I'm working under windows, so I don't have a command prompt to work with 'print'.
I would really appreciate your help! |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
|
Fixed it! I followed the redirect of the rss-feed by simply opening it in my browser and see where it took me. That link I added to my recipe. On that link the 'print_version' method does work! Now I'll continue working on the style.
This is the code sofar: Code:
#!/usr/bin/env python # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai from __future__ import with_statement __license__ = 'GPL v3' __docformat__ = 'restructuredtext en' import re from calibre.web.feeds.news import BasicNewsRecipe from calibre.ptempfile import PersistentTemporaryFile class Tweakers(BasicNewsRecipe): title = u'Tweakers.netMOD2' __author__ = 'Roedi06' language = 'nl' oldest_article = 7 max_articles_per_feed = 3 keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}), {'id':'reacties'}, ] remove_tags = [dict(name='div', attrs={'id' : ['utracker']}), {'class' : ['sidebar']}, {'class' : ['moderation']}, {'class' : ['filterBox']}, {'id' : ['toggleButtonTxt']}, {'class' : ['twitter-share-button']}, {'class' : ['textadTop']}, {'class' : ['commentLink']}, {'class' : ['pageIndex']}, {'class' : ['reactieHeader collapsed']}, ] no_stylesheets=True preprocess_regexps = [ (re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'<p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'</p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''), ] extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 1px #333333; } \ .reactieContent { color: #000000; font-size: 8px; }' feeds = [(u'Tweakers.net', u'http://feeds.feedburner.com/tweakers/nieuws')] def print_version(self, url): return url + '?max=200' |
![]() |
![]() |
![]() |
#4 |
Junior Member
![]() Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
|
Completed!
I successfully completed the Recipe for 'Tweakers.net - Including reactions'
For details see code below or have a look at the attached file. Kovid can you have look at it and maybe include it in a next release?? ![]() Code:
#!/usr/bin/env python # vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai from __future__ import with_statement __license__ = 'GPL v3' __docformat__ = 'restructuredtext en' import re from calibre.web.feeds.news import BasicNewsRecipe from calibre.ptempfile import PersistentTemporaryFile class Tweakers(BasicNewsRecipe): title = u'Tweakers.net - with Reactions' __author__ = 'Roedi06' language = 'nl' oldest_article = 7 max_articles_per_feed = 100 cover_url = 'http://img51.imageshack.us/img51/7470/tweakersnetebook.gif' keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}), {'id':'reacties'}, ] remove_tags = [dict(name='div', attrs={'id' : ['utracker']}), {'id' : ['channelNav']}, {'id' : ['contentArea']}, {'class' : ['breadCrumb']}, {'class' : ['nextPrevious ellipsis']}, {'class' : ['advertorial']}, {'class' : ['sidebar']}, {'class' : ['filterBox']}, {'id' : ['toggleButtonTxt']}, {'id' : ['socialButtons']}, {'class' : ['button']}, {'class' : ['textadTop']}, {'class' : ['commentLink']}, {'title' : ['Reageer op deze reactie']}, {'class' : ['pageIndex']}, {'class' : ['reactieHeader collapsed']}, ] no_stylesheets=True preprocess_regexps = [ (re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'<p>', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'</p>', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'<a.*?>'), lambda h1: '<b><u>'), (re.compile(r'</a>'), lambda h2: '</u></b>'), (re.compile(r'<span class="new">', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'</span>', re.IGNORECASE | re.DOTALL), lambda match : ''), (re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_0'), lambda match : ' - moderated 0<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_0'), (re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_1'), lambda match : ' - moderated +1<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_1'), (re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_2'), lambda match : ' - moderated +2<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_2'), (re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_3'), lambda match : ' - moderated +3<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_3'), (re.compile(r'<div class="moderation">.*?</div>'), lambda h1: ''), ] extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 2px #333333; border-top:solid 1px #333333; } \ .reactieContent { font-family:"Times New Roman",Georgia,Serif; color: #000000; font-size: 8px; } \ .quote { font-family:"Times New Roman",Georgia,Serif; padding-left:2px; border-left:solid 3px #666666; color: #666666; }' feeds = [(u'Tweakers.net', u'http://feeds.feedburner.com/tweakers/nieuws')] def print_version(self, url): return url + '?max=200' |
![]() |
![]() |
![]() |
#5 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,195
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
done .
|
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
LWN.net Weekly News recipe | davide125 | Recipes | 22 | 11-12-2014 09:44 PM |
Modified Reuters News Recipe Submission | rogerx | Recipes | 1 | 08-25-2011 10:19 PM |
recipe for FAZ.net - german | schuster | Recipes | 10 | 05-28-2011 12:13 AM |
Modified Irish Times Recipe | phiznlil | Recipes | 2 | 04-01-2011 06:27 AM |
Request: Inquirer.net Recipe update | zoilom | Recipes | 0 | 12-21-2010 01:06 AM |