MobileRead Forums - View Single Post - Custom recipes (archive, read-only)

TonytheBookworm · 09-11-2010, 02:46 PM

Quote:

Originally Posted by willswords

Hi there. I'm new to Calibre and was wondering if someone could help me with my recipe for the Deseret News (Salt Lake City, Utah, USA Newspaper, http://desnews.com ). I've cobbled something together from what I have seen in other recipes, but I can't get it to use the mobile url instead of the regular one. The stories come through, but with all the extra stuff I don't want. The mobile versions of the articles look pretty clean though, but I must be doing something wrong because it isn't using the mobile url for the stories.

Here is what I have so far:

Spoiler:

here you go...
take note of the comments in the following code:

Spoiler:

Code:

from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1284222826(BasicNewsRecipe):
    title          = u'Deseret News mobile'
    __author__ =  'WillsWords'
    description = 'Deseret News selected feeds'
    category = 'news, politics, USA, Utah'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True
    remove_javascript = True
    #I ADDED KEY_ONLY_TAGS to only keep the content section on the mobile page
    keep_only_tags     = [dict(name='div', attrs={'id':['content']})]
    #I ADDED REMOVE TAGS TO GET RID OF THE COMMENTS AND THE TOOL BAR AT THE TOP
    remove_tags = [dict(name='div', attrs={'id':['tools','story-comments']})] 
                          
    masthead_url = "http://www.deseretnews.com/media/img/icons/dn-masthead-logo.gif"

    feeds          = [(u'Top News', u'http://www.deseretnews.com/home/index.rss'), (u'Utah', u'http://www.deseretnews.com/utah/index.rss'), (u'Movies', u'http://www.deseretnews.com/movies/index.rss'), (u'LDS Newsline', u'http://www.deseretnews.com/ldsnews/index.rss'), (u'Sports', u'http://www.deseretnews.com/sports/index.rss')]

    #I FIXED YOUR INDENT it was all the way to the left it has to be within the class so align it with the indent 
    #of title, remove_javascript, ect...
    
    def print_version(self, url):
        split1 = url.split("/")
        url3 = split1[2]
        url4 = split1[3]
        url5 = split1[4]
        url6 = split1[5]


        #example of link to convert
        #http://www.deseretnews.com/article/700064426/Elizabeth-Smarts-father-joins-bike-ride-to-lobby-for-laws-protecting-children-from-predators.html
        #http://www.deseretnews.com/mobile/article/700064426/Elizabeth-Smarts-father-joins-bike-ride-to-lobby-for-laws-protecting-children-from-predators.html


        print_url = 'http://' + url3 + '/mobile/' + url4 + '/' + url5 + '/' + url6
        #I ADDED THE FOLLOWING TO SHOW YOU IN THE LOG FILE WHAT THE ACTUAL PRINT URL IS.  Once you see it showing the 
        #the currect url then you should be good to go other than just cleaning up a few tags by using keep only and remove
        print 'THIS URL WILL PRINT: ', print_url
        return print_url