Hi Kovid -
The NYTimes recipe has a collection of --> (sometimes twice) leftover in many articles, which seems to be an unmatched comment tag. I've tried replacing in preprocess and postprocess_html but can't seem to figure out how to do so. Any ideas the best way to remove this from articles?
Code:
def postprocess_html(self, soup, first_fetch):
findcomment = soup.findAll(text = re.compile('--gt&;'))
for comment in findcomment:
fixed_text = unicode(comment).replace('--gt&;', '')
comment.replace_with(fixed_text)
return soup
Doesn't seem to work and returns articles with nothing but the -->