Quote:
Originally Posted by kovidgoyal
Code:
def preprocess_html(soup):
for a in soup.findAll('a', href=True): a['href'] = ''
return soup
|
When I try this, it strips everything out i.e. I just end up with a book containing a cover page, a summary page and then two pages (one for each feed). Each feed page is empty apart from the title.
I suppose what I'm looking for is a way of filtering only when processing the feed link page.