View Single Post
Old 03-06-2010, 09:04 PM   #1544
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by gabe973 View Post
I get an error saying that "re" is not defined.
Sorry, you also need "import re" after "from calibre.web.feeds.news import BasicNewsRecipe"
One option would be to use the same command to strip tags instead of remove_tags.

So you can say:

Code:
    preprocess_regexps = [
        (re.compile(r'<!--.*-->', re.DOTALL|re.IGNORECASE), lambda match: ''),
        (re.compile(r'<div class="something".*/div>', re.DOTALL|re.IGNORECASE), lambda match: ''),
        ]
to strip all tags that start <div class="something"

Alternatively, stick a soup into postprocess_html, print the soup to make sure it's working to find tags you want and use findAll() and extract() on the tag to strip it. There's always more than one way to skin a kangaroo.
Starson17 is offline