Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-24-2011, 04:53 AM   #1
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 204
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
Recipe Shortlist UK

Shortlist doesn't provide RSS.
I just can't get my head around the use of soup to extract links.
www.feed43 allowed me to get RSS.

Here is the recipe - it produces a 7 meg file of most of the site.
Personally I think it's good for us none python folks to get feeds from difficult sites

Anyone agree?

Here's the recipe

Spoiler:


Code:
import re
from calibre import __appname__
from calibre.utils.magick import Image, PixelWand
class AdvancedUserRecipe1324663493(BasicNewsRecipe):
    title          = u'Shortlist'
    oldest_article = 7
    max_articles_per_feed = 10
    remove_empty_feeds = True
    remove_javascript     = True
    no_stylesheets = True
    __author__ = 'Dave Asbury'
    # last updated 24/12/11
    language = 'en_GB'

    cover_url = 'http://www.originalpenguin.eu/wp-content/uploads/2010/05/shortlist-cover.jpg'
    masthead_url = 'http://www.mediauk.com/logos/100/344096.png'

    #auto_cleanup_keep = '//*[@class="hero-image"]'
    #auto_cleanup_keep = '//*[@class="article "]'

    #auto_cleanup = True
    preprocess_regexps = [
    (re.compile(r'…or.*?email to your friends</a>.', re.IGNORECASE | re.DOTALL), lambda match: '')]

    keep_only_tags = [
	          dict(name='h1'),
	          dict(name='h2',attrs={'class' : 'title'}),
                              dict(name='h3',atts={'class' : 'subheading'}),
	          dict(attrs={'class' : [ 'hero-static','stand-first']}), 
                              dict(attrs={'class' : 'hero-image'}),
       	          dict(name='div',attrs={'id' : ['list','article','article alternate']}),
	          dict(name='div',attrs={'class' : 'stand-first'}),
          #dict(name='p')

        ]
    remove_tags = [dict(name='h2',attrs={'class' : 'graphic-header'}),
	       dict(attrs={'id' : ['share','twitter','facebook','digg','delicious','facebook-like']}),
	       dict(atts={'class' : ['related-content','related-content-item','related-content horizontal','more']}),

	]

    remove_tags_after = [dict(name='p',attrs={'id' : 'tags'})
	]

    feeds          = [
	     (u'Instant Improver', u'http://feed43.com/1236541026275417.xml'),
	     (u'Cool Stuff',u'http://feed43.com/6253845228768456.xml'),
                         (u'Style',u'http://feed43.com/7217107577215678.xml'),
                         (u'Films',u'http://feed43.com/3101308515277265.xml'),
	     (u'Music',u'http://feed43.com/2416400550560162.xml'),
	     (u'TV',u'http://feed43.com/4781172470717123.xml'),
	     (u'Sport',u'http://feed43.com/5303151885853308.xml'),
	     (u'Gaming',u'http://feed43.com/8883764600355347.xml'),
                         (u'Women',u'http://feed43.com/2648221746514241.xml'),
	#(u'Articles', u'http://feed43.com/3428534448355545.xml')
	]

Last edited by scissors; 12-24-2011 at 01:32 PM.
scissors is offline   Reply With Quote
Old 12-24-2011, 01:51 PM   #2
scissors
Addict
scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.scissors ought to be getting tired of karma fortunes by now.
 
Posts: 204
Karma: 1001369
Join Date: Sep 2010
Device: prs300, kindle keyboard 3g
recipe updated. now getting all of site. merry xmas (check out the WOMEN section!)
scissors is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Books disappearing when added to Shortlist after update HarleyB Kobo Reader 25 09-02-2011 07:02 PM
Scotiabank Giller 2010 Prize Shortlist SensualPoet News 43 11-13-2010 02:18 PM
Man Booker Prize shortlist announced TGS News 124 10-13-2010 07:45 AM
Booker shortlist to be available online rixte News 4 09-16-2008 04:35 AM
Every novel on Man Booker Prize shortlist to be available free for online readers drago News 10 10-19-2007 02:37 PM


All times are GMT -4. The time now is 02:34 PM.


MobileRead.com is a privately owned, operated and funded community.