View Single Post
Old 06-19-2013, 11:20 AM   #1
JeffreyZhao
Junior Member
JeffreyZhao began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jun 2013
Device: Kindle Paperwhite
"keep_only_tags" doesn't work?

I'm using the test recipe to crawl infoq.com:

Code:
class InfoQ_Test(BasicNewsRecipe):
    title = u'InfoQ Test'
    auto_cleanup = True
    no_stylesheets = True
    
    keep_only_tags = [dict(id=['content'])]

    def parse_index(self):
        items = []
        
        items.append({ 'title': 'Article1', 'url': 'http://www.infoq.com/news/2013/06/stratos-2' })
        items.append({ 'title': 'Article2', 'url': 'http://www.infoq.com/news/2013/06/document-messaging-analysis' })
                
        return [("Default", items)]
I want to keep the "div" with id="content" only from the whole page, but calibre just remove all the elements under "body". We could remove the "keep_only_tags" settings to get the article content successfully, but I just want to know why it doesn't work with "keep_only_tags".

Thanks
JeffreyZhao is offline   Reply With Quote