View Single Post
Old 11-21-2009, 07:19 AM   #875
JayCeeEll
Connoisseur
JayCeeEll doesn't litterJayCeeEll doesn't litterJayCeeEll doesn't litter
 
JayCeeEll's Avatar
 
Posts: 87
Karma: 204
Join Date: Dec 2007
Location: Exeter, Devon, UK
Device: PRS-300
remove_tags not removing tags

I am working on some new recipes and I am having trouble with the remove_tags pre-processing routine.

The following script should just download the blog entry and comments, but I am also getting the sidebar contents, what am I doing wrong?

An example article is http://www.badscience.net/2009/11/oh-that-was-quick/

PHP Code:
__license__   'GPL v3'
__copyright__ '2009, JayCeeEll'

from calibre.web.feeds.news import BasicNewsRecipe

class BadScience(BasicNewsRecipe):
    
title                 u'Bad Science'
    
language              'en'
    
__author__            'JayCeeEll'
    
description           'Bad science in the media'
    
author                'Ben Goldacre'
    
publisher             'Ben Goldacre'
    
category              'blog, skepticism'
    
oldest_article        7
    max_articles_per_feed 
100
    no_stylesheets        
True
    encoding              
'utf8'
    
remove_javascript     True
    use_embedded_content  
False

    keep_only_tags 
= [dict(name='div'attrs={'class':'padded'})]
    
    
remove_tags = [
                   
dict(name='p'attrs={'class':'meta'})
                  ,
dict(name='div'attrs={'id':'respond'})
                  ,
dict(name='div'attrs={'id':'sidebar_right'})
                  ]

    
feeds = [(u'Bad Science'        u'http://www.badscience.net/feed/'      )] 
JayCeeEll is offline