Thread: Vanity Fair
View Single Post
Old 10-11-2010, 11:11 AM   #2
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by sdow1 View Post
I created a (fairly basic) recipe for Vanity Fair, and it seems to work pretty well as far as getting the full article content without "too much" extraneous stuff, but I would love if someone else who is better at this than I wants to run with it (i.e., adding covers, cleaning it up further, etc.). (Note, I also don't have the fourth VF RSS feed, relating to their Soccer Blog, in here, because I had no interest in it, but it obviously might be of interest to a more general audience)

Thanks!
Spoiler:

Code:
class AdvancedUserRecipe1283352306(BasicNewsRecipe):
    title          = u'Vanity Fair'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets        = True
    use_embedded_content  = False

    feeds          = [(u'The Latest From Vanity Fair.com', u'http://www.vanityfair.com/services/rss/feeds/everything.xml'), (u'VF Daily Blog', u'http://www.vanityfair.com/online/daily/rss.xml'), (u"Wolcott's Blog", u'http://www.vanityfair.com/online/wolcott/rss.xml')]

    def print_version(self, url):
        return url + '?printable=true'
Take and use remove_tags
For instance to get rid of the print options at the top use this in your code. I always put it before the feed section but you can put it pretty much anywhere inside the class block just make sure your indents are correct.

you see when using firebug in firefox that the element you wish to remove is
<div id="printoptions"> so the below will get rid of that.

Spoiler:

Code:
remove_tags = [ dict(name='div', attrs={'id':['printoptions']})]


As for the cover it depends on what cover you wish to use. Take again and use firefox and figure out what element of article (soup) you want to use as your image source. For instance lets say our cover is in the
<div class="spread-image"> we would use something like this to get the image as the cover.

Spoiler:

Code:
def get_cover_url(self):
        cover_url = None
        soup = self.index_to_soup(self.index)
        cover_item = soup.find('div',attrs={'class':'spread-image'})
        if cover_item:
           cover_url = 'http://www.wired.com' + cover_item.a.img['src']
        return cover_url



If however you want just a static cover (never changes) then simply take and put the following
Code:
cover_url = 'PUT THE URL TO THE IMAGE HERE'
and thats it.

good luck let me know if you need any help. just post your code and indicate where you seem to be having issues. also utilize (
Spoiler:
)(
Code:
)put your code in here (
)(
) without the ()'s of course. This will keep the thread cleaner and keep the formatting correct because python is picky about indents.
TonytheBookworm is offline   Reply With Quote