Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 01-15-2012, 08:31 AM   #1
roedi06
Junior Member
roedi06 began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
This is a modified Recipe for Tweakers.net including reactions to the news feeds posted by users. Multiple people requested this, while the reactions mostly contain valuable information. I'm a newbie, but I was able to construct this recipe by modifying the existing recipe. It does work, but not totally satisfactory, please read on, I hope you can help!

Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement

__license__   = 'GPL v3'
__docformat__ = 'restructuredtext en'

import re
from calibre.web.feeds.news import BasicNewsRecipe

class Tweakers(BasicNewsRecipe):
     title          = u'Tweakers.netMOD'
     __author__     = 'Roedi06'
     language       = 'nl'
     oldest_article = 7
     max_articles_per_feed = 50

     keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}),
	{'id':'reacties'},
	  ]

     remove_tags    = [dict(name='div', attrs={'id' : ['utracker']}),
                        {'class' : ['sidebar']},
                        {'class' : ['moderation']},
                        {'class' : ['filterBox']},
                        {'id' : ['toggleButtonTxt']},
                        {'class' : ['twitter-share-button']},
                        {'class' : ['textadTop']},
                        {'class' : ['commentLink']},
						{'class' : ['pageIndex']},
						{'class' : ['reactieHeader collapsed']},
                      ]


     no_stylesheets=True

     preprocess_regexps = [
     (re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'<p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'</p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     ]

     extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 1px #333333; } \
				 .reactieContent { color: #000000; font-size: 8px; }' 

	 
     feeds          = [(u'Tweakers.net', u'http://tweakers.net/feeds/nieuws.xml')]
Problem is that the page only loads a certain amount of reactions when fetching it. After reading the forum I tried the following;

To get around this the url has to be modified. I tried so by doing:
Code:
def print_version(self, url):
     return url + '?max=200'
Didn't work!

So I tried:
Code:
def print_version(self, url):
            return self.browser.open_novisit(url).geturl().replace('html', 'html?max=200')
Didn't work.

So I tried the get_article method:
Code:
    def get_article_url(self, article):
 return self.browser.open_novisit(url).geturl() + '?max=200'
Didn't work...

Are there any suggestions how to get around this. I think it is a nasy server-redirect that is bugging me, and I'm not sure if there is a work-around. Strange thing though; When I load the URL myself from the browser with '?max=200' added to the URL it does work.
Attached Files
File Type: zip Tweakers.netMOD_1002.zip (927 Bytes, 43 views)

Last edited by roedi06; 01-17-2012 at 05:35 AM.
roedi06 is offline   Reply With Quote
Old 01-16-2012, 03:45 PM   #2
roedi06
Junior Member
roedi06 began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
As mentioned I'm a newbie. How can I check whether those 'get_article' or 'print_version' methods are actually being called.. I'm working under windows, so I don't have a command prompt to work with 'print'.

I would really appreciate your help!
roedi06 is offline   Reply With Quote
Old 01-16-2012, 04:34 PM   #3
roedi06
Junior Member
roedi06 began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
Fixed it! I followed the redirect of the rss-feed by simply opening it in my browser and see where it took me. That link I added to my recipe. On that link the 'print_version' method does work! Now I'll continue working on the style.

This is the code sofar:
Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement

__license__   = 'GPL v3'
__docformat__ = 'restructuredtext en'

import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile

class Tweakers(BasicNewsRecipe):
     title          = u'Tweakers.netMOD2'
     __author__     = 'Roedi06'
     language       = 'nl'
     oldest_article = 7
     max_articles_per_feed = 3

     keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}),
	{'id':'reacties'},
	  ]

     remove_tags    = [dict(name='div', attrs={'id' : ['utracker']}),
                        {'class' : ['sidebar']},
                        {'class' : ['moderation']},
                        {'class' : ['filterBox']},
                        {'id' : ['toggleButtonTxt']},
                        {'class' : ['twitter-share-button']},
                        {'class' : ['textadTop']},
                        {'class' : ['commentLink']},
     	    {'class' : ['pageIndex']},
	    {'class' : ['reactieHeader collapsed']},
                      ]
     no_stylesheets=True

     preprocess_regexps = [
     (re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'<p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'</p*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     ]

     extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 1px #333333; } \
  	   .reactieContent { color: #000000; font-size: 8px; }' 

	 
     feeds          = [(u'Tweakers.net', u'http://feeds.feedburner.com/tweakers/nieuws')]

     def print_version(self, url):
        return url + '?max=200'
roedi06 is offline   Reply With Quote
Old 01-17-2012, 05:34 AM   #4
roedi06
Junior Member
roedi06 began at the beginning.
 
Posts: 9
Karma: 10
Join Date: Jan 2012
Device: SONY PRS-T1
Completed!

I successfully completed the Recipe for 'Tweakers.net - Including reactions'

For details see code below or have a look at the attached file. Kovid can you have look at it and maybe include it in a next release?? Me and more people in the tweakers.net community would be thankfull!

Code:
#!/usr/bin/env python
# vim:fileencoding=UTF-8:ts=4:sw=4:sta:et:sts=4:ai
from __future__ import with_statement

__license__   = 'GPL v3'
__docformat__ = 'restructuredtext en'

import re
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile

class Tweakers(BasicNewsRecipe):
     title          = u'Tweakers.net - with Reactions'
     __author__     = 'Roedi06'
     language       = 'nl'
     oldest_article = 7
     max_articles_per_feed = 100
     cover_url       = 'http://img51.imageshack.us/img51/7470/tweakersnetebook.gif'
	
     keep_only_tags = [dict(name='div', attrs={'class':'columnwrapper news'}),
	{'id':'reacties'},
	  ]

     remove_tags    = [dict(name='div', attrs={'id' : ['utracker']}),
						{'id' : ['channelNav']},
						{'id' : ['contentArea']},
						{'class' : ['breadCrumb']},
						{'class' : ['nextPrevious ellipsis']},
						{'class' : ['advertorial']},
						{'class' : ['sidebar']},
						{'class' : ['filterBox']},
                        {'id' : ['toggleButtonTxt']},
						{'id' : ['socialButtons']},
                        {'class' : ['button']},
                        {'class' : ['textadTop']},
                        {'class' : ['commentLink']},
						{'title' : ['Reageer op deze reactie']},
						{'class' : ['pageIndex']},
	    {'class' : ['reactieHeader collapsed']},
                      ]
     no_stylesheets=True

     preprocess_regexps = [
     (re.compile(r'<hr*?>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'<p>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'</p>', re.IGNORECASE | re.DOTALL), lambda match : ''),
     (re.compile(r'<a.*?>'), lambda h1: '<b><u>'),
     (re.compile(r'</a>'), lambda h2: '</u></b>'),
	 (re.compile(r'<span class="new">', re.IGNORECASE | re.DOTALL), lambda match : ''),
	 (re.compile(r'</span>', re.IGNORECASE | re.DOTALL), lambda match : ''),
	 (re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_0'), lambda match : ' - moderated 0<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_0'),
	 (re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_1'), lambda match : ' - moderated +1<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_1'),
	 (re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_2'), lambda match : ' - moderated +2<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_2'),
	 (re.compile(r'<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_3'), lambda match : ' - moderated +3<div class="moderation"><img src="http://tweakimg.net/g/if/comments/score_3'),
	 (re.compile(r'<div class="moderation">.*?</div>'), lambda h1: ''),
	 ]

     extra_css = '.reactieHeader { color: #333333; font-size: 6px; border-bottom:solid 2px #333333; border-top:solid 1px #333333; } \
  	   .reactieContent { font-family:"Times New Roman",Georgia,Serif; color: #000000; font-size: 8px; } \
	   .quote { font-family:"Times New Roman",Georgia,Serif; padding-left:2px; border-left:solid 3px #666666; color: #666666; }'
	   
	   

	 
     feeds          = [(u'Tweakers.net', u'http://feeds.feedburner.com/tweakers/nieuws')]

     def print_version(self, url):
        return url + '?max=200'
Attached Files
File Type: zip Tweakers.net including React_1000.zip (429 Bytes, 106 views)
roedi06 is offline   Reply With Quote
Old 01-17-2012, 07:42 AM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,617
Karma: 4998447
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
done .
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
LWN.net Weekly News recipe davide125 Recipes 16 10-15-2012 12:47 AM
Modified Reuters News Recipe Submission rogerx Recipes 1 08-25-2011 10:19 PM
recipe for FAZ.net - german schuster Recipes 10 05-28-2011 12:13 AM
Modified Irish Times Recipe phiznlil Recipes 2 04-01-2011 06:27 AM
Request: Inquirer.net Recipe update zoilom Recipes 0 12-21-2010 01:06 AM


All times are GMT -4. The time now is 02:52 AM.


MobileRead.com is a privately owned, operated and funded community.