Perfect - thanks.
As an aside, I often use the following to remove hyperlinks. Is there an easy way to format the affected text from which hyperlink was removed (perhaps with underlining) to leave a visible trace indicating that there is a link in the source document?
def postprocess_html(self, soup, first_fetch):
for a in soup.findAll('a', href=True):
del a['href']
return soup
|