View Single Post
Old 09-25-2010, 12:51 AM   #2
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Someone please test this code on your end and see if you get any junk. I don't want to keep submitting what I believe to be working code to Kovid and then turning around looking like a moron when it ends up looking like crap

Thanks....

Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'Popular Science'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'Popular Science'
    publisher = 'Popular Science'
    category = 'gadgets,science'
    oldest_article = 7 # change this if you want more current articles. I like to go a week in
    max_articles_per_feed = 100
    no_stylesheets = True
    remove_javascript = True
    use_embedded_content = True
    
    masthead_url = 'http://www.raytheon.com/newsroom/rtnwcm/groups/Public/documents/masthead/rtn08_popscidec_masthead.jpg'
    
               
    feeds          = [
                      
                      ('Gadgets', 'http://www.popsci.com/full-feed/gadgets'),
                      ('Cars', 'http://www.popsci.com/full-feed/cars'),
                      ('Science', 'http://www.popsci.com/full-feed/science'),
                      ('Technology', 'http://www.popsci.com/full-feed/technology'),
                      ('DIY', 'http://www.popsci.com/full-feed/diy'),
                      
                    ]

    
 #The following will get read of the Gallery: links when found    
        
    def preprocess_html(self, soup) :
        print 'SOUP IS: ', soup
        weblinks = soup.findAll(['head','h2'])
        if weblinks is not None:
            for link in weblinks:
                if re.search('(Gallery)(:)',str(link)):
                  
                  link.parent.extract()
        return soup
  #-----------------------------------------------------------------


***Starson17 - I used the use_embedded_content flag that i didn't know anything about until you mentioned it. Makes some feeds a looooot easier. Thanks
TonytheBookworm is offline   Reply With Quote