Quote:
Originally Posted by swmkdr
Okay, so I tried to make an AVClub website recipe by customising the bbc one. It seems to work fine, but I need some help removing all the extra stuff - headers, sidebar, images etc. This is what the recipe looks like at the moment:
...
Any help would be appreciated. Thanks in advance.
|
Change your remove_tags to this:
Code:
keep_only_tags = [dict(name='div', attrs={'id':'content'})
]
remove_tags = [dict(name='div', attrs={'class':['footer','tools_horizontal']}),
dict(name='div', attrs={'id':['tool_holder','elsewhere_on_avclub']})
]
This was only tested on one article, so you'll need to test the others.
As an aside, when someone has tried to make the recipe, and posts the recipe with feeds, etc., it makes it easier to help. Further, I'm more inclined to try to help if they've done as much as they can. In this case, as in many cases, all that was needed was to run Firefox on the article, then use Firebug to identify the class, div, id, etc. for elements that should be kept or removed.