Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-16-2012, 11:32 AM   #1
lydgate
Junior Member
lydgate began at the beginning.
 
Posts: 1
Karma: 10
Join Date: May 2012
Device: Kindle Touch 3g
My first recipe (osnews.com)

Hi everyone,

Got a Kindle about a month ago and downloaded Calibre, so far I really love it. I've been playing with the News recipes and think they're great, but wanted to be able to start making my own or modifying the built-in ones. I figured the best way to learn was just to make one, so I made one for a site I read pretty regularly called OSNews.

It took a little while to get working, because the print version of OSNews requires a referer. I managed to hunt down some code in one of the built-ins that shows how to forge this.

I figured out how to get rid of the annoying url text that the site inserts in the text using preprocess_regexps.

I then had a problem because auto_cleanup annoyingly inserted </p> before <a>, causing unwanted paragraph breaks whenever there was a link. I turned off auto_cleanup and used keep_only_tags and this seemed to work better (don't know why).

There's still a few issues though. At first I had it downloading just the most recent RSS, and this worked fine, but now I'm trying to download three sections, and I'm not sure how to get these divided up in the way that many mobis are divided (e.g. NY Times or Ars Technica).

Also, when I had auto_cleanup on, although it caused problems, it also removed <a> tags in the title which I think is better. Not sure how to do this though.

Also, the byline seems to be a bit close to the text, ideally I'd like the formatting to be different the way it is in the NYT.

Here's the code:
Spoiler:
Code:
import mechanize

class AdvancedUserRecipe1336752090(BasicNewsRecipe):
    title          = u'OSNews'
    oldest_article = 7
    max_articles_per_feed = 100
    auto_cleanup = False

    feeds = [(u'Editorials',u'http://www.osnews.com/feed/kind/Editorial'),
             (u'Features', u'http://www.osnews.com/feed/kind/Feature'),
             (u'Interviews', u'http://www.osnews.com/feed/kind/Interview')]

    preprocess_regexps = [(re.compile(r' \[http.*\]', re.IGNORECASE), lambda m: '')]

    keep_only_tags = [ dict(name='div', attrs={'class':'printitem'}),
                       dict(name='div', attrs={'class':'printtitle'}),
                       dict(name='div', attrs={'class':'printcontent'})]
    
    def get_browser(self):
        br = BasicNewsRecipe.get_browser(self)
        cookies = mechanize.CookieJar()
        br = mechanize.build_opener(mechanize.HTTPCookieProcessor(cookies))
        br.addheaders = [('Referer','http://www.osnews.com/')]
        return br

    def print_version(self, url):
        return url.replace('story','print')


Go easy on me, it's my first one!

Last edited by lydgate; 05-16-2012 at 11:34 AM.
lydgate is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 04:57 AM
Recipe for ng.pl markoz Recipes 4 04-13-2011 05:03 PM
Need Help with Recipe UtahJames Recipes 1 04-12-2011 09:50 AM
I need some help with a recipe jefferson_frantz Recipes 14 11-22-2010 02:06 PM
New recipe kiklop74 Recipes 0 10-01-2010 02:42 PM


All times are GMT -4. The time now is 03:04 AM.


MobileRead.com is a privately owned, operated and funded community.