View Single Post
Old 05-01-2010, 05:16 PM   #1877
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by swmkdr View Post
Okay, so I tried to make an AVClub website recipe by customising the bbc one. It seems to work fine, but I need some help removing all the extra stuff - headers, sidebar, images etc. This is what the recipe looks like at the moment:
...

Any help would be appreciated. Thanks in advance.
Change your remove_tags to this:
Code:
    keep_only_tags     = [dict(name='div', attrs={'id':'content'})
                          ]

    remove_tags    = [dict(name='div', attrs={'class':['footer','tools_horizontal']}),
                      dict(name='div', attrs={'id':['tool_holder','elsewhere_on_avclub']})
                      ]
This was only tested on one article, so you'll need to test the others.

As an aside, when someone has tried to make the recipe, and posts the recipe with feeds, etc., it makes it easier to help. Further, I'm more inclined to try to help if they've done as much as they can. In this case, as in many cases, all that was needed was to run Firefox on the article, then use Firebug to identify the class, div, id, etc. for elements that should be kept or removed.
Starson17 is offline