Quote:
Originally Posted by gabe973
I get an error saying that "re" is not defined.
|
Sorry, you also need "import re" after "from calibre.web.feeds.news import BasicNewsRecipe"
One option would be to use the same command to strip tags instead of remove_tags.
So you can say:
Code:
preprocess_regexps = [
(re.compile(r'<!--.*-->', re.DOTALL|re.IGNORECASE), lambda match: ''),
(re.compile(r'<div class="something".*/div>', re.DOTALL|re.IGNORECASE), lambda match: ''),
]
to strip all tags that start <div class="something"
Alternatively, stick a soup into postprocess_html, print the soup to make sure it's working to find tags you want and use findAll() and extract() on the tag to strip it. There's always more than one way to skin a kangaroo.